WO2020165627A1 - Limited flooding in dense graphs - Google Patents

Limited flooding in dense graphs Download PDF

Info

Publication number
WO2020165627A1
WO2020165627A1 PCT/IB2019/051133 IB2019051133W WO2020165627A1 WO 2020165627 A1 WO2020165627 A1 WO 2020165627A1 IB 2019051133 W IB2019051133 W IB 2019051133W WO 2020165627 A1 WO2020165627 A1 WO 2020165627A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
lsa
adjacency
network
nodes
Prior art date
Application number
PCT/IB2019/051133
Other languages
French (fr)
Inventor
David Ian Allan
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/IB2019/051133 priority Critical patent/WO2020165627A1/en
Publication of WO2020165627A1 publication Critical patent/WO2020165627A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/32Flooding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/48Routing tree calculation
    • H04L45/484Routing tree calculation using multiple routing trees
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/64Routing or path finding of packets in data switching networks using an overlay routing layer

Definitions

  • Embodiments of the invention relate to the field of interior gateway protocol (IGP) operation; and more specifically, to a process for reducing quantity of link state advertisements (LSAs) that are flooded in dense mesh networks.
  • IGP interior gateway protocol
  • LSAs link state advertisements
  • a mesh network is a type of network topology.
  • the mesh network In a mesh network the
  • Hierarchical mesh networks are often utilized in data centers to connect the nodes of the data center, e.g., servers, switches and similar devices. In such environments, the number of nodes and interconnects can be very high, which results in a mesh network referred to as a dense mesh network.
  • IGPs Interior gateway protocols
  • AS autonomous system
  • This routing information can be used for routing of data using network layer protocols (e.g., the Internet Protocol (IP)).
  • IP Internet Protocol
  • IGPs can exchange routing information using link state advertisements (LSAs) or similar messages.
  • LSAs enable nodes to share their local network topology and other information with other nodes in the network (e.g., within an AS).
  • the nodes may share their adjacencies, i.e., the links to neighbor nodes within the network.
  • These LSAs can be flooded within the network such that each node in the network can determine a complete network topology that enables the computation of routing information for the network layer protocols.
  • LSAs are flooded by a node when the node has a change of local topology or during similar events.
  • the normal procedure when a node receives an LSA that it has not received before is that the receiving node refloods the LSA on all adjacencies except the one of arrival.
  • a method of managing the forwarding of link state advertisements in an interior gateway protocol includes computing a first spanning tree with a first root for a network, computing a second spanning tree with a second root for the network, the first spanning tree and the second spanning tree collectively defining a constrained flooding topology for the network, and flooding link state advertisements (LSAs) on the constrained flooding topology according to a set of forwarding rules.
  • IGP interior gateway protocol
  • a network device executes the method of managing the forwarding of link state advertisements in the IGP.
  • the network device includes a non-transitory computer readable medium having stored therein a constrained flooding manager, and a processor coupled to the non-transitory computer readable medium.
  • the processor executes the constrained flooding manager.
  • the constrained flooding manager computes a first spanning tree with a first root for a network, computes a second spanning tree with a second root for the network, the first spanning tree and the second spanning tree defining a constrained flooding topology for the network, and floods link state advertisements (LSAs) on the constrained flooding topology according to a set of forwarding rules.
  • LSAs link state advertisements
  • a computing device implements a plurality of virtual machines.
  • the plurality of virtual machines implements network function virtualization (NFV), where at least one virtual machine from the plurality of virtual machines implements the method of managing the forwarding of link state advertisements in an interior gateway protocol.
  • the computing device includes a non-transitory computer readable medium having stored therein a constrained flooding manager, and a processor coupled to the non-transitory computer readable medium.
  • the processor executes the at least one virtual machine from the plurality of virtual machines.
  • the virtual machine executes the constrained flooding manager.
  • the constrained flooding manager computes a first spanning tree with a first root for a network, computes a second spanning tree with a second root for the network, the first spanning tree and the second spanning tree defining a constrained flooding topology for the network, and floods link state advertisements (LSAs) on the constrained flooding topology according to a set of forwarding rules.
  • a control plane device is in communication with a plurality of data plane nodes in a software defined networking (SDN) network. The control plane device implements the method of managing the forwarding of link state advertisements in an IGP.
  • the control plane device includes a non-transitory computer readable medium having stored therein a constrained flooding manager, and a processor coupled to the non-transitory computer readable medium.
  • the processor executes the constrained flooding manager.
  • the constrained flooding manager computes a first spanning tree with a first root for a network, computes a second spanning tree with a second root for the network, the first spanning tree and the second spanning tree defining a constrained flooding topology for the network and floods link state advertisements (LSAs) on the constrained flooding topology according to a set of forwarding rules.
  • LSAs link state advertisements
  • Figure 1 is a diagram of one embodiment of a physical topology of a dense mesh network.
  • Figure 2 is a diagram of one embodiment of a set of spanning trees for link state advertisement (LSA) forwarding in a dense mesh network.
  • LSA link state advertisement
  • Figure 3 is a flowchart of one embodiment of a process for establishing constrained flooding in a network.
  • Figure 4A is a flowchart of one embodiment of a process for handling LSAs received at a node in the dense mesh network.
  • Figure 4B is a flowchart of one embodiment of a process for handling LSAs generated by local events.
  • Figure 5 is a diagram of one example embodiment of LSA forwarding for a subset of nodes where there is overlap between the low spanning tree and the high spanning tree.
  • Figure 6 is a flowchart of a process implemented by the nodes that self-identify as ‘needy’ nodes.
  • Figure 7 is a flowchart of a process implemented by the nodes that self-identify as ‘constellation’ nodes.
  • Figure 8 is a diagram of an example network topology that illustrates a special constrained flooding case.
  • IUU2UJ figure 9 A illustrates connectivity between network devices (NDs) within an exemplary network, as well as three exemplary implementations of the NDs, according to some
  • Figure 9B illustrates an exemplary way to implement a special-purpose network device according to some embodiments of the invention.
  • FIG. 9C illustrates various exemplary ways in which virtual network elements (VNEs) may be coupled according to some embodiments of the invention.
  • VNEs virtual network elements
  • Figure 9D illustrates a network with a single network element (NE) on each of the NDs, and within this straight forward approach contrasts a traditional distributed approach (commonly used by traditional routers) with a centralized approach for maintaining reachability and forwarding information (also called network control), according to some embodiments of the invention.
  • NE network element
  • Figure 9E illustrates the simple case of where each of the NDs implements a single NE, but a centralized control plane has abstracted multiple of the NEs in different NDs into (to represent) a single NE in one of the virtual network(s), according to some embodiments of the invention.
  • Figure 9F illustrates a case where multiple VNEs are implemented on different NDs and are coupled to each other, and where a centralized control plane has abstracted these multiple VNEs such that they appear as a single VNE within one of the virtual networks, according to some embodiments of the invention.
  • Figure 10 illustrates a general-purpose control plane device with centralized control plane (CCP) software 1050), according to some embodiments of the invention.
  • CCP centralized control plane
  • the following description describes methods and apparatus for managing link state advertisements (LSAs) for the interior gateway protocol (IGP).
  • LSAs link state advertisements
  • IGP interior gateway protocol
  • the embodiments provide an LSA forwarding process that reduces the quantity of LSAs that the IGP floods to the nodes in a network while ensuring that all nodes in the network receive at least one copy of every LSA introduced into the LSA forwarding process.
  • the process provides a significant reduction in forwarded LSA traffic in dense mesh networks or similar networks.
  • the LSA forwarding process utilizes at least two equal and diverse paths between any pair of IGP nodes in the network. In some embodiments of the LSA forwarding process two spanning trees are computed to be utilized for the LSA forwarding in the network.
  • the embodiments of the LSA forwarding process can utilize the tie breaking algorithm used in Institute of Electrical and Electronics Engineers (IEEE) 802. laq Shortest Path Bridging as an element of construction of the LSA forwarding topology. While the embodiments are described as applied to LSA forwarding, one skilled in the art would understand that the process is applicable to similar network management scenarios.
  • IEEE Institute of Electrical and Electronics Engineers
  • references in the specification to“one embodiment,”“an embodiment,”“an example embodiment,” etc. indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
  • Bracketed text and blocks with dashed borders may be used herein to illustrate optional operations that add additional features to embodiments of the invention. However, such notation should not be taken to mean that these are the only options or optional operations, and/or that blocks with solid borders are not optional in certain embodiments of the invention.
  • Coupled is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other.
  • Connected is used to indicate the establishment of communication between two or more elements that are coupled with each other.
  • An electronic device stores and transmits (internally and/or with other electronic devices over a network) code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using machine-readable media (also called computer-readable media), such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (also called a carrier) (e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, infrared signals).
  • machine-readable media also called computer-readable media
  • machine-readable storage media e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory
  • machine-readable transmission media also called a carrier
  • carrier e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, inf
  • an electronic device e.g., a computer
  • hardware and software such as a set of one or more processors (e.g., wherein a processor is a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, other electronic circuitry, a combination of one or more of the preceding) coupled to one or more machine-readable storage media to store code for execution on the set of processors and/or to store data.
  • processors e.g., wherein a processor is a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, other electronic circuitry, a combination of one or more of the preceding
  • an electronic device may include non-volatile memory containing the code since the non-volatile memory can persist code/data even when the electronic device is turned off (when power is removed), and while the electronic device is turned on that part of the code that is to be executed by the processor(s) of that electronic device is typically copied from the slower non volatile memory into volatile memory (e.g., dynamic random access memory (DRAM), static random access memory (SRAM)) of that electronic device.
  • Typical electronic devices also include a set or one or more physical network interface(s) (NI(s)) to establish network connections (to transmit and/or receive code and/or data using propagating signals) with other electronic devices.
  • NI(s) physical network interface
  • a physical NI may comprise radio circuitry capable of receiving data from other electronic devices over a wireless connection and/or sending data out to other devices via a wireless connection.
  • This radio circuitry may include transmitted s), received s), and/or transceiver(s) suitable for radiofrequency communication.
  • the radio circuitry may convert digital data into a radio signal having the appropriate parameters (e.g., frequency, timing, channel, bandwidth, etc.). The radio signal may then be transmitted via antennas to the appropriate recipient(s).
  • the set of physical NI(s) may comprise network interface controlled s) (NICs), also known as a network interface card, network adapter, or local area network (LAN) adapter.
  • NICs network interface controlled s
  • the NIC(s) may facilitate in connecting the electronic device to other electronic devices allowing them to communicate via wire through plugging in a cable to a physical port connected to a NIC.
  • One or more parts of an embodiment of the invention may be implemented using different combinations of software, firmware, and/or hardware.
  • a network device is an electronic device that communicatively interconnects other electronic devices on the network (e.g., other network devices, end-user devices).
  • Some network devices are“multiple services network devices” that provide support for multiple networking tunctions (e.g., routing, bridging, switching, Layer 2 aggregation, session border control, Quality of Service, and/or subscriber management), and/or provide support for multiple application services (e.g., data, voice, and video).
  • a member adjacency or adjacent member node refers to an adjacency between a given node and an immediately connected node in a network topology (e.g., an IGP network topology) where both are IGP speakers.
  • An IGP speaker is a node that implements IGP within the physical topology of the network.
  • the constrained flooding topology is a
  • a ‘member’ node is a node that is a part of the constrained flooding topology.
  • a member adjacency indicates that the immediately connected node is a member node and the adjacency is part of the constrained flooding topology. It is possible also that member nodes can be connected by adjacencies that are not part of the constrained flooding topology.
  • a participating node is a node in the network that is an IGP speaker (i.e., a node that implements IGP) that has advertised the capability, and thus the intention, to participate in a constrained flooding topology.
  • a participating adjacency refers to an adjacency, i.e., an immediate link or connection, between two participating nodes.
  • a contiguous set of participant nodes are also member nodes.
  • a participant node may be severed from all adjacencies with other participant nodes, hence can be simultaneously a participant and non member node.
  • a non-participating node refers to a node in the network that is an IGP speaker but has not advertised a capability or intent to participate in a constrained flooding topology.
  • a node may be a non-participant when the capability is not implemented, or the implementation may support participation in a constrained flooding topology, but the node has been administratively configured to be a non-participant.
  • a constrained flooding topology is composed of a contiguously connected set of participating nodes.
  • the constrained flooding topology can be constructed from two diversely rooted spanning trees. Thus, there are two spanning trees each with a different root in the network. Every participating node that has more than one participating adjacency is required to be bi-connected to the constrained flooding topology. This is achieved by constrained flooding topology construction. For a bipartite graph or similar hierarchical network, the resulting constrained flooding topology diameter will typically be two times the depth of the tree hierarchy. The compromise in this approach is that a subset of nodes in the network will not see a reduction of the replication burden from current practice when flooding LSAs.
  • the degree of this subset of nodes in the constrained flooding topology i.e., the number of adjacencies for this subset of nodes, will correspond to the degree of the physical topology, i.e., the number of adjacencies of nodes in the physical topology.
  • the participating nodes can utilize LSAs to flood local topology information and related information to other nodes in the network.
  • a member node may forward a received LSA to adjacent member nodes via the constrained flooding topology.
  • Specific forwarding rules beyond that normally associated with spanning tree forwarding prevent undue flooding of the LSAs.
  • the result of the flooding of LSAs using the constrained flooding topology is that every participant node that has more than one participating adjacency will be bi- connected to the constrained flooding topology and will receive two copies of any flooded LSA in a fault free dense mesh network. Participating nodes that are only singly connected will receive one copy as would a chain of bi-connected nodes terminating on a singly connected note.
  • Participating nodes may be singly connected to the constrained flooding topology due to degradation of the network, e.g., a failure of a link or node or as an artifact of network design.
  • the participating nodes implement a set of forwarding rules for handling LSAs in the constrained flooding topology. These forwarding rules are described further herein below.
  • a constrained flooding process can be applied to varying network topologies.
  • These network topologies can include those networks that can be represented as pure bipartite graphs, bipartite graphs modified with the addition of intra-tier adjacencies, and hierarchical variations of the above.
  • the advantages of the constrained flooding process will vary according to network topology with the aforementioned network topologies having discernable reductions in LSA traffic when the constrained flooding process is implemented. For sake of clarity and
  • the constrained flooding process constructs two spanning trees to be utilized for the forwarding of LSAs amongst participating nodes.
  • the constrained flooding process can utilize the tie breaking algorithm from the Institute of Electrical and Electronics Engineers (IEEE) 802. laq.
  • IEEE 802. laq shortest path bridging process for the construction of the spanning trees is used herein by way of example for spanning tree construction. In particular the process used is described in clause 28.5 of IEEE 802. laq.
  • a component of the IEEE 802. laq tree computation process employed in the embodiments is the tie breaking component.
  • the IEEE 802. laq tree computation process produces a symmetrically congruent mesh of multicast trees and unicast forwarding whereby the path between any two nodes in the network is symmetric in both directions and congruent for both unicast and multicast traffic.
  • the IEEE 802. laq tree computation process or a similar tree computation process is used in the generation of two diversely rooted spanning trees that define the constrained flooding topology.
  • the IEEE 802. laq tree computation tie breaks between equal cost paths, i.e. the IEEE 802. laq tree computation utilizes a deterministic tie breaking process to select one of a set of equal cost paths when constructing a tree, in this case a spanning tree.
  • a path-identifier path-id
  • a path-id is expressed as a lexicographically sorted list of the node-identifiers (node-ids) for a given equal cost path.
  • the set of equal cost paths is ranked using these path-ids, and the lowest ranking path-id and corresponding path are selected.
  • the ranking is based on the lexicographical sorting. As an example, a path-id 23-39-44-85-98 is ranked lower than a path-id 23-44-59-90- 93.
  • the path-ids are of unequal length, the path-ids with the fewest hops are ranked as being superior to the longer paths, and tie breaking is applied to select between the shorter path- ids.
  • the node-ids used would be the loopback address of each node, therefore each path-id will be unique.
  • the IEEE 802. laq tree computation process includes the concept of an "algorithm- mask", which is a value XOR'd with the node-ids prior to sorting into path IDs and ranking the paths.
  • This algorithm-mask permits the construction of diverse trees in a dense topology.
  • Two algorithm masks are used for the construction of the two diverse spanning trees used to define the constrained flooding topology (zero and -1).
  • When computing two trees from the same root when there are at least two nodes to choose from at each distance from the root, fully diverse trees will be generated.
  • diverse nodes When computing two trees from diverse roots in a tree architecture, diverse nodes will be selected in each tier in the hierarchy as the relay nodes to the next tier. The selection of relay nodes does have implications for root selection as described further herein below.
  • the IEEE 802. laq tree computation process has the property of permitting the pruning of intermediate state as a Dijkstra computation progresses since equal cost path ties can be immediately evaluated, and all paths other than the selected path removed from further consideration. This is desirable when performing a Dijkstra computation in a dense graph as all path permutations do not need to be carried forward during computation. This permits the computation of the spanning tree to be quite fast despite the complexity of a dense mesh.
  • the resulting computational complexity can be expressed as 2N(lnN), where N is a number of nodes in the network.
  • the constrained LSA flooding process depends on tie breaking between sets of node IDs to produce diverse paths, therefore it can place some restrictions on root selection.
  • a root can be selected so that the root’s node-id when XORd with the associated algorithm mask is the lowest ranked node in the local tier in the tree hierarchy. This would be analogous to path-id ranking where the paths were all of length 1.
  • the root is not selected such that the node-id when XORd with the other root’s algorithm mask is the lowest ranked node. This would result in the root also being a transit node for the other spanning tree and produce a scenario whereby a single failure could render both spanning trees incomplete.
  • the embodiments can avoid roots that are directly connected for the low and high spanning trees. If the topology does not permit this to be satisfied purely by root selection, then the inter-root adjacency can be pruned from the graph prior to spanning tree computation to ensure that diverse paths between the roots are used. For a true bipartite graph, there may be no other restrictions on node selection. For a bipartite graph modified with inter-tier links, the roots can be placed in different tiers to ensure a pathological combination of link weights and node-ids does not result in a scenario where a single failure would render the constrained flooding topology incomplete. Other sources of failure may exist that can be addressed by introducing an administrative component to root selection. This, for example, would ensure that both roots were not selected from a common shared risk group.
  • FIG. l is a diagram of an example network with a full mesh topology.
  • the underlying full mesh of connections between the nodes is provided by way of example and not limitation.
  • an example dense mesh network with 56 nodes numbered 0 to 55 is illustrated.
  • each of the nodes in tier 1 (Nodes 48-55) is connected to each of the nodes in tier 2 (Nodes 24-47).
  • Nodes 24-31 are connected to each of nodes 0 to 7 in tier 3.
  • Nodes 32-39 are connected to each of nodes 8 to 15 in tier 3.
  • Nodes 40-47 are connected to each of nodes 16 to 23 in tier 3.
  • This organization of a dense mesh network is provided by example.
  • a dense mesh network can include any number of nodes and have any number of tiers. The nodes in each tier are connected with a large proportion of the nodes in adjacent tiers.
  • FIG 2 is a diagram of one embodiment of a set of spanning trees for link state advertisement (LSA) forwarding in a dense mesh network.
  • LSA link state advertisement
  • two spanning trees are illustrated connecting each of the nodes in the dense mesh network.
  • the underlying full mesh connections shown in Figure 1 are not illustrated here for clarity.
  • These two spanning trees 101 and 105 collectively form a constrained flooding topology.
  • Spanning tree 101 is rooted 103 at node 48.
  • the root 107 of spanning tree 105 is node 55.
  • each node in the networ has one connection with each of the spanning trees 101 and 105.
  • the spanning trees 101 and 105 have diverse roots and paths to minimize overlap such that a single failure of a link or node will not prevent at least one copy of an LSA to reach each node.
  • the roots can be selected by any mechanism that permits fully diverse spanning trees to be computed using a spanning tree computation (e.g., using IEEE 802. laq).
  • the spanning trees 101 and 105 are capable of propagating LSAs across the constrained flooding topology with a latency comparable to unconstrained flooding of LSAs in the dense mesh network.
  • Each node in the dense mesh topology is an IGP speaker.
  • Each IGP speaker in the network has knowledge of each of the two spanning tree roots and the algorithm mask associated with each.
  • Each participating IGP speaker in the network computes a spanning tree from each of the two roots (using the algorithm mask associated with each root) and from that can determine its own role in the constrained flooding topology.
  • the two spanning trees can be referred to as the "low spanning tree” and the "high spanning tree.”
  • FIG. 3 is a flowchart of one embodiment of a process for establishing constrained LSA flooding in a network.
  • the dense mesh network begins in a state where the topology is unstable and there are no spanning trees established for constrained LSA flooding.
  • the topology can be unstable where there has not been sufficient time for the nodes in the network to compute routing in response to the change of links and nodes in the network advertised via LSA.
  • the constrained flooding process can begin by initializing a set of timers (e.g., timers T1 and T2) (Block 301).
  • T1 can represent a period of quiescence for LSA advertisements after which a first spanning tree (i.e., either the high or low spanning tree) is computed.
  • T2 can represent a second period of quiescence for LSA advertisements after which a second spanning tree (i.e., the other of the high or low spanning trees) is computed.
  • T1 and T2 can be selected such that one complete spanning tree is always stable. In this embodiment, T1 is less than T2.
  • the timers T1 and T2 can be continuously decremented over time until they expire.
  • the constrained LSA flooding process can await the expiration of the timers while monitoring for received or locally generated LSAs as well as similar events (Block 303). If an LSA is received from an adjacent node, then the LSA is flooded on all interfaces to all adjacent nodes except for the interface on which the LSA was received (Block 305). The constrained LSA Hooding process then resets the timers (T1 + T2) (Block 301) and awaits the next event
  • Block 303 If an LSA was locally generated by the node, e.g., where the node determines a change on its interfaces with adjacent nodes, then the LSA is flooded on all interfaces of the node to all adjacent nodes in the dense mesh network (Block 307). The constrained LSA flooding process then resets the timers (T1 + T2) (Block 301) and awaits the next event
  • the root of the first spanning tree is determined using any process or mechanism for determining diverse set of roots (Block 309).
  • the second root can also be determined at this time.
  • the node can compute the first spanning tree using the first root (Block 311).
  • the first spanning tree can be computed using the IEEE 8021. aq process or similar spanning tree computation process.
  • the constrained LSA flooding process can continue to monitor for events as the second timer continues to decrement (Block 313). If an LSA is received from an adjacent node, then the LSA is flooded on all interfaces to all adjacent nodes except for the interface on which the LSA was received (Block 305). The constrained LSA flooding process then resets the timers (Tl + T2) (Block 301) and awaits the next event (Block 303). If an LSA was locally generated by the node, e.g., where the node determines a change on its interfaces with adjacent nodes, then the LSA is flooded on all interfaces of the node to all adjacent nodes in the dense mesh network (Block 307). The constrained LSA flooding process then resets the timers (Tl + T2) (Block 301) and awaits the next event (Block 303).
  • the node can compute the second spanning tree using the second root (Block 315).
  • the second spanning tree can be computed using the IEEE 802. laq process or similar spanning tree computation process.
  • the constrained LSA flooding process then enters a state of having established spanning trees and a stable network topology.
  • the constrained LSA flooding process monitors for events including a received LSA or locally generated LSA (Block 317). If a local LSA is generated or an LSA is received, then the LSA is forwarded according to a set of rules for the constrained flooding topology (Block 319). These rules are discussed in further detail with relation to Figures 4A and 4B.
  • the constrained LSA flooding process After the received or generated LSAs are flooded according to the rules of the constrained flooding topology, then the constrained LSA flooding process resets the timers, i.e., timers 1 1 and 12 (Block 321) and enters a state where the spanning trees are established, but the network topology is unstable.
  • the constrained LSA flooding process monitors for further events (Block 323). If a local LSA is generated or an LSA is received, then the process floods the LSA according to the rules for the constrained flooding topology as discussed herein below
  • Block 325 If the timer T1 expires, then the constrained LSA process starts to determine the first and second roots (Block 329) and recomputes the first spanning tree (Block 331). The process then monitors for further events (Block 333), and if the second timer expires then the constrained LSA flooding process computes the second spanning tree (Block 335) before re entering the spanning tree established and network topology stable state to await further events (Block 317). If additional LSAs are received or generated, then the constrained LSA flooding process floods LSAs according to the set of rules for the constrained flooding topology
  • the high and low spanning trees maintained by this process provide a redundant topology. Contrary to the common usage of a spanning tree, in the embodiments the distinction between upstream and downstream adjacencies between nodes is important and is an input to how a participant node further relays any LSAs that are received. Upstream member adjacencies are in the direction of a root, and downstream member adjacencies are in the direction away from the root. The reason for this is that the constrained flooding topology is the combination of the two spanning trees, they do not operate independently.
  • FIG. 4A is a flowchart of one embodiment of a process for handling LSAs received at a node in the dense mesh network.
  • the embodiments do not require that the flooded LSA’s protocol design be modified to include additional information. No additional information is required to associate a received LSA with a given tree, nor is such information needed.
  • the flowchart illustrates the forwarding rules for the constrained LSA flooding process.
  • the constrained LSA flooding process is activated in response to a node in the constrained flooding topology receiving an LSA via an adjacency (i.e., on an interface from an adjacent node in the network) (Block 401).
  • a check is made to determine whether the received LSA has been received previously (Block 403).
  • This check may utilize the information in the local routing table to compare with the information being advertised in the received LSA.
  • the receiving node does not relay LSAs that it has already seen nor does it add the information to the local information store as it is already there, thus, a previously received LSA is discarded
  • Block 405 For new LSAs a check is made whether the received LSA is from a non-participant adjacency (i.e., the LSA is received from an adjacent non-participant node) (Block 407). If the LSA is received from a non-participant adjacency, then the LSA is flooded (i.e., forwarded) to all member adjacencies and non-participant adjacencies except the adjacency on which the LSA is received (i.e., the LSA is forwarded to each adjacent member node and to each adjacent non participant node except the sending node) (Block 409).
  • the LSA is received on a member adjacency, then a determination is made whether the member adjacency is upstream, downstream, or both within the constrained flooding topology, where upstream is toward both roots, downstream is away from both roots, and upstream/downstream is a split between the two roots (Block 411).
  • a new LSA received from an upstream member adjacency is relayed to all downstream member adjacencies, irrespective of which spanning tree the adjacencies are part of (i.e., the LSA is forwarded to each downstream adjacent member node) (Block 415).
  • the LSA is flooded to each non-participant adjacency (i.e., to each adjacent non-participating node).
  • a new LSA received from a downstream member adjacency is flooded on all other member adjacencies exclusive of the adjacency of arrival irrespective of which spanning tree the adjacencies are part of (Block 417).
  • the LSA is flooded on all non-participating adjacencies (i.e., the LSA is forwarded to each adjacent non-participating node).
  • a new LSA received from a member adjacency where upstream and downstream is ambiguous is flooded on all other member adjacencies exclusive of the adjacency of arrival irrespective of which adjacency the links are part of (i.e., the LSA is forwarded to each adjacent member node) (Block 413).
  • the LSA is flooded on all non-participating adjacencies (i.e., the LSA is forwarded to each adjacent non-participating node).
  • FIG. 4B is a flowchart of one embodiment of a process for handling LSAs generated by local events. If the LSA to be processed by a node is generated as a local event (Block 451), rather than being received from an adjacent node, then the process floods the LSA on all member adjacencies and on all non-participating adjacencies (i.e., the LSA is forwarded to each adjacent node) (Block 453).
  • a participating node that is added to the constrained flooding topology will initially not be served by the constrained flooding topology.
  • a participating node adjacent to that node can treat it as a non-participating node until such time as tree re-optimization has completed.
  • typically two adjacent participating nodes will have member adjacencies with the new node, so the ability to flood LSAs between the new node and the constrained flooding topology will have been uninterrupted during the process.
  • the embodiments address nodal behaviors with respect to constraining flooding to member adjacencies.
  • the participating nodes were a subset of a larger network, it is possible to advertise the capability to participate in flood reduction.
  • each participating node uses this information to be able to identify the set of participating adjacencies and confine the spanning tree computation to the set of participating adjacencies in order to identify local set of member adjacencies.
  • a node that had a combination of participating and non-participating adjacencies can handle this such that for any new LSA received on a participating adjacency, in addition to the rules for member adjacencies, it would also flood the LSA on all non-participating adjacencies. For any new LSA received on a non-participating adjacency, it would flood the LSA on all member adjacencies.
  • procedures are designed to permit more than one constrained flooding topology in an IGP domain. In which case participating nodes would have to be administratively configured to associate with a constrained flooding topology instance.
  • the embodiments may include re-optimization of the constrained flooding topology after a topology change. In order to maintain complete convergence, the process may not recompute the spanning trees simultaneously. In the embodiments described above, the computations of spanning trees are separated in time. Re-optimization of the low spanning tree does not take place at the same time as re-optimization of the high spanning tree.
  • the embodiments can reoptimize an incomplete tree first, however this would require the participating nodes to maintain a complete map of all member adjacencies so that a common determination of the most degraded spanning tree and hence the order of re-optimization could be made.
  • a participating node at power up will be not be able to establish member links until it has synchronized with the network and the network is stable in the new topology.
  • This node can treat a power up similarly to how a topology change and network re-optimization is treated. The only difference being that it will flood all LSAs received or originated until both spanning trees have stabilized.
  • IEEE 802. laq included additional mechanisms to prevent looping, a reverse path forwarding check, and digest exchange across adjacencies to ensure IGP synchronization.
  • Routing LSAs are not relayed if they are a duplicate, therefore destructive looping cannot occur and additional mitigation mechanisms are not required.
  • the constrained flooding topology used by the constrained LSA flooding process ensures that no single failure rendered both spanning trees incomplete.
  • the only dual link failure that can render the constrained flooding topology incomplete is if a participant node has failures in both upstream member adjacencies. This can be partially mitigated if the node recognizes this scenario and reverts to flooding on all adjacencies.
  • the constrained LSA flooding process surrounding participating nodes that receive the LSA on a non-member adjacency will introduce the LSA into the constrained flooding topology.
  • a pathological scenario is the simultaneous failure of both roots. Root selection can place the roots two hops apart so there will be a constituency of participants that would observe a simultaneous failure of both upstream member adjacencies and revert to normal flooding.
  • FIG. 5 is a diagram of one example embodiment of LSA forwarding for a subset of nodes where there is overlap between the low spanning tree and the high spanning tree.
  • nodes 0 and 55 are the roots of the high and low spanning trees.
  • a sub-set of the links of the high and low spanning trees are shown to illustrate a case of overlap on links between nodes 0 and 31 and between nodes 24 and 55.
  • the nodes originating LSAs on these links need to only send one copy of the LSA over the links whereas otherwise two LSAs would be sent with one being sent for each spanning tree.
  • the roots can treat new LSAs on these links as upstream and flood them on their respective trees where they are the root.
  • IEEE 802. laq tie-breaking algorithm is utilized in the process of determining the diverse trees.
  • a generalized form without these elements can be generally applied while ensuring that all nodes that have more than one interface are diversely bi-connected to a constrained flooding topology.
  • the generalized constrained flooding topology generation process described above produces diverse trees where most nodes are bi-connected.
  • the additional embodiments generalize the constrained flooding topology generation process to increase the variety of topologies where the nodes will be bi-connected in the constrained flooding topology.
  • the previously described process of constrained flooding topology generation will not necessarily result in all nodes being bi-connected to the constrained flooding topology.
  • Nodes that are not bi-connected to a constrained flooding topology generated with the previously described process will have a physical adjacency that is an upstream member adjacency for both spanning trees.
  • the constrained flooding topology generated herein-above will have only a small number of participant nodes that are not bi-connected to the constrained flooding topology.
  • Nodes that have only a single upstream member adjacency to the constrained flooding topology, but that have more than one participant adjacencies are referred to herein as ‘needy nodes.’
  • Needy nodes Correcting these situations to ensure the constrained flooding topology will remain complete across any single failure in a reasonably meshed network of arbitrary topology can be accomplished by local establishment of a member adjacency between the needy node and one of the adjacent member nodes.
  • the participant nodes immediately adjacent to the needy node can be referred to as constellation nodes.
  • the embodiments provide as part of the process of establishing the constrained flooding topology for the needy node and the constellation nodes to independently determine their pairwise relationship and where appropriate install a local modification to the constrained flooding topology. This obviates the need for additional protocol exchange to coordinate constrained flooding topology modifications.
  • each of the participant nodes in the network determines whether it has a member adjacency that is an upstream adjacency for both spanning trees that make up the constrained flooding topology. If so it is a needy node and is required to perform the needy node procedures.
  • the participant nodes also check whether there are any adjacent needy nodes. For each adjacent needy node where the participant node is not the upstream member node in the flooding topology, the participant node will perform procedures for a constellation node. Nodes perform both checks as they may simultaneously be both constellation nodes and needy nodes.
  • FIG. 6 is a flowchart of a process implemented by the participant nodes that self- identify as‘needy’ nodes. This process is implemented by each node that self-identifies as a needy node as described herein above and is performed excluding any modification made to the spanning trees by needy nodes processing from consideration to avoid race conditions.
  • the node can initiate the process by determining if the node only has a single participant adjacency (Block 601), then no further action is possible, and the process completes (Block 603).
  • the node checks whether it is physically bi-connected in the constrained flooding topology
  • Block 605 If the node is determined to be physically bi-connected to participant nodes, then the node modifies its downstream adjacency to include the role of upstream member adjacency (Block 623). If the node is not bi-connected, then a check is made to determine if there are further contiguous upstream single points of failure. Several single points of failure checks can be included in this determination. The immediately upstream node is a single point of failure for a needy node, and the needy node checks whether nodes further upstream form a contiguous chain of single points of failure (e.g., where there is a chain of physically bi-connected nodes or the node has only a single upstream adjacency in the constrained flooding topology).
  • the node traces upstream until a node that is bi-connected to the constrained flooding topology is found or the root is reached (Block 607).
  • a bi-connected node is found it is identified as the single point of failure to be avoided where alternate paths from the node are sought. In some cases, this may be the root.
  • the process continues to select paths to the root for the computing node (i.e., the needy node). For each of the constellation nodes of the computing needy node, the process constructs a list of node IDs for the upstream path to each of the roots (Block 609). If some but not all of these paths transit the node previously identified to be the single point of failure to be avoided, then these paths are removed from further consideration unless all remaining paths transit this single point of failure (Block 611). The process then eliminates any paths from further consideration that are longer than a shortest path to the closest root (Block 613).
  • Path identifiers are constructed for each of the remaining paths using lists of node identifiers, which are padded to create an equal number of hops in each path where needed. These path identifiers can then be lexicographically sorted, ranked, or similarly constructed to enable a deterministic selection of one of the remaining paths (Block 619). The adjacency of the selected path is marked with the path identifier as an upstream member adjacency (Block 621).
  • FIG. 7 is a flowchart of a process implemented by the nodes that self-identify as ‘constellation’ nodes. This process is implemented by each node that self-identifies as a constellation node as described herein above and is performed excluding any modification made to the spanning trees by the needy node processing from consideration to avoid race conditions. If the corresponding needy node is physically singly connected to the network, then no other participant nodes will self-identify as a constellation node for that needy node. Otherwise, this process is similar to that of the needy node. The constellation node can initiate the process by determining if the needy node is physically bi-connected in the constrained flooding topology (Block 701).
  • the constellation node is the only constellation node and the adjacency to the corresponding needy node is modified to include the role of downstream member adjacency (Block 703). If the needy node is not bi- connected, then a check is made to determine the furthest contiguously connected single points of failure. As with the process of the needy node, there are several single points of failure checks that can be included in this determination. The node can check whether the upstream node is a single point of failure, and whether nodes further upstream are single points of failure (e.g., where there is a chain of physically bi-connected nodes or the node has only a single upstream adjacency in the constrained flooding topology).
  • the node can trace upstream until a bi- connected node is found, or the root is reached (Block 707). When a bi-connected node is found it is identified as the single point of failure to be avoided where alternate paths from the node are sought (note that this may be a root).
  • the process continues to identify alternate paths to the root for the needy node via the set of constellation nodes including the constellation node performing the computation. For all constellation nodes the process constructs a list of node IDs for the upstream path to each of the roots (Block 709). If some but not all of these paths transit the node that was previously identified to be the single points of failure to be avoided, then these paths are removed from further consideration (Block 711). The process then eliminates any paths from further consideration that are longer than a shortest path from the needy node to the given root
  • Path identifiers are constructed for each of the remaining paths using lists of node identifiers, which are padded to create an equal number of hops in each path where needed. These path identifiers can then be lexicographically sorted, ranked, or similarly constructed to enable a deterministic selection of one of the remaining paths
  • FIG. 8 is a diagram of an example network topology that illustrates a special constrained flooding case.
  • this network topology there are eight participant nodes, numbered 0-3, 5-6, 8, and 10.
  • the two spanning trees of the constrained flooding topology are rooted at participant nodes one and two. All links in this example are equal cost and only a first copy of a received LSA is reflooded.
  • an LSA originates from node 0. If the LSA is received by node 5 first on path 0-10-5, then it will be treated as a downstream adjacency and forwarded to nodes 2 and 6. If the LSA is received by node 5 first on path 0-6-5, then it will be treated as received from an upstream adjacency and forwarding to node 10.
  • the embodiments can be adapted to handle cases such that an LSA can be received on both an upstream and downstream interface in a random order.
  • the adapted embodiment handles this case by tracking the class of interface of arrival for an LSA.
  • the forwarding rules are adapted such that if an LSA is received that has not been seen before at a node, then the LSA is forwarded according to the class of arrival (i.e., based on upstream or downstream adjacency, where arrival on a non-member adjacency is considered to be the equivalent of a downstream arrival). If an LSA has been received before on an upstream adjacency and is received again on a downstream adjacency, then the LSA will be forwarded on the other upstream adjacencies.
  • the embodiments presented herein above can also be adapted for more generalized application with adaptations to the root selection process.
  • the root selection process can be a distributed process performed by all participant nodes.
  • the root selection performs a search of the network topology for all node pairs that are two hops apart and have at least two equal and lowest cost two hop paths between them and no equal cost paths that span less than or greater than two hops otherwise flooding topology generation can produce undesirable results. Nodes with two hops between them will have the properties that simultaneous dual root failure is visible to the transit nodes and results in a minimum flooding diameter by minimizing the worst case which would be the failure of an intra-root path.
  • the resulting set of possible root pairs are then ranked for selection.
  • the criteria for ranking root pairs can include maximizing a sum of the degree of connectedness of both roots, with a tie break minimizing a delta between the two roots. This option provides a reduction in the number of needy nodes in the constrained flooding topology.
  • Other criteria can include minimizing the distance from a root to the network edge, minimizing or maximizing a spread in root identifiers, minimizing or maximizing a path cost between the roots, and selecting a pair with the lowest or highest node identifiers. Any combination or these criteria can be used to rank or similarly sort the possible root node pairs and make a selection. All participant nodes that implement the root selection will use the same criteria to implement the common ranking algorithm.
  • the ranking or root pairs selects a root pair with minimum path cost between them. If there is more than one such root pair, then the pair with the maximum sum of connectedness is selected. If further tie breaking is needed, then the pair with the minimum spread in degree of connectedness is selected and then the pair with the lowest root id.
  • the spanning trees are required to be re-optimized simultaneously rather than independently. Handling needy nodes in the generalized process utilizes both spanning trees. Thus, both are optimized at the same time to avoid artifacts.
  • root selection can be included in the re-optimization process. During periods of instability related to re-optimization, the nodes can treat LSAs received from participant non-member adjacent nodes as from downstream nodes, which are forwarded to all member adjacent nodes.
  • FT generation will minimize the number of hops across the network where per hop processing of LSAs at each node is expected to dwarf transmission times, hence is the desirable metric to optimize for; the number of needy nodes is minimized as a unary metric tends to amplify path diversity; and root selection is simplified as the hop count is also the cost and eliminates a set of comer cases to be considered.
  • Figure 9A illustrates connectivity between network devices (NDs) within an exemplary network, as well as three exemplary implementations of the NDs, according to some embodiments of the invention.
  • Figure 9A shows NDs 900A-H, and their connectivity by way of lines between 900A-900B, 900B-900C, 900C-900D, 900D-900E, 900E-900F, 900F-900G, and 900A-900G, as well as between 900H and each of 900A, 900C, 900D, and 900G.
  • These NDs are physical devices, and the connectivity between these NDs can be wireless or wired (often referred to as a link).
  • NDs 900A, 900E, and 900F An additional line extending from NDs 900A, 900E, and 900F illustrates that these JNDs act as ingress and egress points for the network (and thus, these NDs are sometimes referred to as edge NDs; while the other NDs may be called core NDs).
  • Two of the exemplary ND implementations in Figure 9A are: 1) a special-purpose network device 902 that uses custom application-specific integrated-circuits (ASICs) and a special-purpose operating system (OS); and 2) a general-purpose network device 904 that uses common off-the-shelf (COTS) processors and a standard OS.
  • ASICs application-specific integrated-circuits
  • OS special-purpose operating system
  • COTS common off-the-shelf
  • the special-purpose network device 902 includes networking hardware 910 comprising a set of one or more processor(s) 912, forwarding resource(s) 914 (which typically include one or more ASICs and/or network processors), and physical network interfaces (NIs) 916 (through which network connections are made, such as those shown by the connectivity between NDs 900A-H), as well as non-transitory machine readable storage media 918 having stored therein networking software 920.
  • the networking software 920 may be executed by the networking hardware 910 to instantiate a set of one or more networking software instance(s) 922.
  • Each of the networking software instance(s) 922, and that part of the networking hardware 910 that executes that network software instance form a separate virtual network element 930A-R.
  • Each of the virtual network element(s) (VNEs) 930A-R includes a control communication and configuration module 932A- R (sometimes referred to as a local control module or control communication module) and forwarding table(s) 934A-R, such that a given virtual network element (e.g., 930A) includes the control communication and configuration module (e.g., 932A), a set of one or more forwarding table(s) (e.g., 934A), and that portion of the networking hardware 910 that executes the virtual network element (e.g., 930A).
  • a control communication and configuration module 932A- R sometimes referred to as a local control module or control communication module
  • forwarding table(s) 934A-R forwarding table(s) 934A-R
  • the special-purpose network device 902 is often physically and/or logically considered to include: 1) a ND control plane 924 (sometimes referred to as a control plane) comprising the processor(s) 912 that execute the control communication and configuration module(s) 932A-R; and 2) a ND forwarding plane 926 (sometimes referred to as a forwarding plane, a data plane, or a media plane) comprising the forwarding resource(s) 914 that utilize the forwarding table(s) 934A-R and the physical NIs 916.
  • a ND control plane 924 (sometimes referred to as a control plane) comprising the processor(s) 912 that execute the control communication and configuration module(s) 932A-R
  • a ND forwarding plane 926 sometimes referred to as a forwarding plane, a data plane, or a media plane
  • the forwarding resource(s) 914 that utilize the forwarding table(s) 934A-R and the physical NIs 916.
  • the ND control plane 924 (the processor(s) 912 executing the control communication and configuration module(s) 932A-R) is typically responsible for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) and storing that routing information in the forwarding table(s) 934A-R, and the ND forwarding plane 926 is responsible for receiving that data on the physical IN Is 916 and forwarding that data out the appropriate ones of the physical NIs 916 based on the forwarding table(s) 934A-R.
  • data e.g., packets
  • the ND forwarding plane 926 is responsible for receiving that data on the physical IN Is 916 and forwarding that data out the appropriate ones of the physical NIs 916 based on the forwarding table(s) 934A-R.
  • Figure 9B illustrates an exemplary way to implement the special-purpose network device 902 according to some embodiments of the invention.
  • Figure 9B shows a special-purpose network device including cards 938 (typically hot pluggable). While in some embodiments the cards 938 are of two types (one or more that operate as the ND forwarding plane 926
  • a service card can provide specialized processing (e.g., Layer 4 to Layer 7 services (e.g., firewall, Internet Protocol Security (IPsec), Secure Sockets Layer (SSL) / Transport Layer Security (TLS), Intrusion Detection System (IDS), peer-to-peer (P2P), Voice over IP (VoIP) Session Border Controller, Mobile Wireless Gateways (Gateway General Packet Radio Service (GPRS) Support Node (GGSN), Evolved Packet Core (EPC) Gateway)).
  • Layer 4 to Layer 7 services e.g., firewall, Internet Protocol Security (IPsec), Secure Sockets Layer (SSL) / Transport Layer Security (TLS), Intrusion Detection System (IDS), peer-to-peer (P2P), Voice over IP (VoIP) Session Border Controller, Mobile Wireless Gateways (Gateway General Packet Radio Service (GPRS) Support Node (GGSN), Evolved Packet Core (EPC) Gateway)).
  • IPsec Internet Protocol
  • a service card may be used to terminate IPsec tunnels and execute the attendant authentication and encryption algorithms. These cards are coupled together through one or more interconnect mechanisms illustrated as backplane 936 (e.g., a first full mesh coupling the line cards and a second full mesh coupling all of the cards).
  • backplane 936 e.g., a first full mesh coupling the line cards and a second full mesh coupling all of the cards.
  • the general-purpose network device 904 includes
  • hardware 940 comprising a set of one or more processor(s) 942 (which are often COTS processors) and physical NIs 946, as well as non-transitory machine-readable storage media 948 having stored therein software 950.
  • processor(s) 942 execute the software 950 to instantiate one or more sets of one or more applications 964A-R. While one embodiment does not implement virtualization, alternative embodiments may use different forms of virtualization.
  • the virtualization layer 954 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 962A-R called software containers that may each be used to execute one (or more) of the sets of applications 964A-R; where the multiple software containers (also called virtualization engines, virtual private servers, or jails) are user spaces (typically a virtual memory space) that are separate from each other and separate from the kernel space in which the operating system is run; and where the set of applications running in a given user space, unless explicitly allowed, cannot access the memory of the other processes.
  • the multiple software containers also called virtualization engines, virtual private servers, or jails
  • user spaces typically a virtual memory space
  • the virtualization layer 954 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system, and each of the sets of applications 964A-R is run on top of a guest operating system within an instance 962A-R called a virtual machine (which may in some cases be considered a tightly isolated form of software container) that is run on top of the hypervisor - the guest operating system and application may not know they are running on a virtual machine as opposed to running on a“bare metal” host electronic device, or through para-virtualization the operating system and/or application may be aware of the presence of virtualization for optimization purposes.
  • a hypervisor sometimes referred to as a virtual machine monitor (VMM)
  • VMM virtual machine monitor
  • one, some or all of the applications are implemented as unikemel(s), which can be generated by compiling directly with an application only a limited set of libraries (e.g., from a library operating system (LibOS) including drivers/libraries of OS services) that provide the particular OS services needed by the application.
  • libraries e.g., from a library operating system (LibOS) including drivers/libraries of OS services
  • unikernel can be implemented to run directly on hardware 940, directly on a hypervisor (in which case the unikemel is sometimes described as running within a LibOS virtual machine), or in a software container
  • embodiments can be implemented fully with unikemels running directly on a hypervisor represented by virtualization layer 954, unikemels running within software containers represented by instances 962A-R, or as a combination of unikemels and the above-described techniques (e.g., unikemels and virtual machines both ran directly on a hypervisor, unikemels and sets of applications that are ran in different software containers).
  • the instantiation of the one or more sets of one or more applications 964A-R, as well as virtualization if implemented, are collectively referred to as software instance(s) 952.
  • the constrained flooding manager 964A-R is an application that implements the processes described herein above.
  • the virtual network element(s) 960A-R perform similar functionality to the virtual network element(s) 930A-R - e.g., similar to the control communication and configuration module(s) 932A and forwarding table(s) 934A (this virtualization of the hardware 940 is sometimes referred to as network function virtualization (NFV)).
  • NFV network function virtualization
  • CPE customer premise equipment
  • each instance 962A-R corresponding to one VNE 960A-R
  • alternative embodiments may implement this correspondence at a finer level granularity (e.g., line card virtual machines virtualize line cards, control card virtual machine virtualize control cards, etc.); it should be understood that the techniques described herein with reference to a correspondence of instances 962A-R to VNEs also apply to embodiments where such a finer level of granularity and/or unikemels are used.
  • the constrained flooding manager 964A-R are implemented as part of the instances 922.
  • the virtualization layer 954 includes a virtual switch that provides similar forwarding services as a physical Ethernet switch. Specifically, this virtual switch forwards traffic between instances 962A-R and the physical NI(s) 946, as well as optionally between the instances 962A-R; in addition, this virtual switch may enforce network isolation between the VNEs 960A-R that by policy are not permitted to communicate with each other (e.g., by honoring virtual local area networks (VLANs)).
  • VLANs virtual local area networks
  • the third exemplary ND implementation in Figure 9A is a hybrid network device 906, which includes both custom ASICs/special-purpose OS and COTS processors/standard OS in a single ND or a single card within an ND.
  • a platform VM i.e., a VM that that implements the functionality of the special-purpose network device 902 could provide for para-virtualization to the networking hardware present in the hybrid network device 906.
  • NE network element
  • each of the VNEs receives data on the physical NIs (e.g., 916, 946) and forwards that data out the appropriate ones of the physical NIs (e.g., 916, 946).
  • the physical NIs e.g., 916, 946
  • a VNE implementing IP router functionality forwards IP packets on the basis of some of the IP header information in the IP packet; where IP header information includes source IP address, destination IP address, source port, destination port (where“source port” and“destination port” refer herein to protocol ports, as opposed to physical ports of a ND), transport protocol (e.g., user datagram protocol (UDP), Transmission Control Protocol (TCP), and differentiated services code point (DSCP) values.
  • transport protocol e.g., user datagram protocol (UDP), Transmission Control Protocol (TCP), and differentiated services code point (DSCP) values.
  • UDP user datagram protocol
  • TCP Transmission Control Protocol
  • DSCP differentiated services code point
  • Figure 9C illustrates various exemplary ways in which VNEs may be coupled according to some embodiments of the invention.
  • Figure 9C shows VNEs 970A.1-970A.P (and optionally VNEs 970A.Q-970A.R) implemented in ND 900A and VNE 970H.1 in ND 900H.
  • VNEs 970A.1-P are separate from each other in the sense that they can receive packets from outside ND 900A and forward packets outside of ND 900A; VNE 970A.1 is coupled with VNE 970H.1, and thus they communicate packets between their respective NDs; VINh 970A.2-970A.3 may optionally forward packets between themselves without forwarding them outside of the ND 900A; and VNE 970A.P may optionally be the first in a chain of VNEs that includes VNE 970A.Q followed by VNE 970A.R (this is sometimes referred to as dynamic service chaining, where each of the VNEs in the series of VNEs provides a different service - e.g., one or more layer 4-7 network services). While Figure 9C illustrates various exemplary relationships between the VNEs, alternative embodiments may support other relationships (e.g., more/fewer VNEs, more/fewer dynamic service chains, multiple different dynamic service chains with some common VNEs
  • the NDs of Figure 9A may form part of the Internet or a private network; and other electronic devices (not shown; such as end user devices including workstations, laptops, netbooks, tablets, palm tops, mobile phones, smartphones, phablets, multimedia phones, Voice Over Internet Protocol (VOIP) phones, terminals, portable media players, GPS units, wearable devices, gaming systems, set-top boxes, Internet enabled household appliances) may be coupled to the network (directly or through other networks such as access networks) to communicate over the network (e.g., the Internet or virtual private networks (VPNs) overlaid on (e.g., tunneled through) the Internet) with each other (directly or through servers) and/or access content and/or services.
  • end user devices including workstations, laptops, netbooks, tablets, palm tops, mobile phones, smartphones, phablets, multimedia phones, Voice Over Internet Protocol (VOIP) phones, terminals, portable media players, GPS units, wearable devices, gaming systems, set-top boxes, Internet enabled household appliances
  • VOIP
  • Such content and/or services are typically provided by one or more servers (not shown) belonging to a service/content provider or one or more end user devices (not shown) participating in a peer-to-peer (P2P) service, and may include, for example, public webpages (e.g., free content, store fronts, search services), private webpages (e.g., usemame/password accessed webpages providing email services), and/or corporate networks over VPNs.
  • end user devices may be coupled (e.g., through customer premise equipment coupled to an access network (wired or wirelessly)) to edge NDs, which are coupled (e.g., through one or more core NDs) to other edge NDs, which are coupled to electronic devices acting as servers.
  • one or more of the electronic devices operating as the NDs in Figure 9A may also host one or more such servers (e.g., in the case of the general purpose network device 904, one or more of the software instances 962A-R may operate as servers; the same would be true for the hybrid network device 906; in the case of the special-purpose network device 902, one or more such servers could also be run on a virtualization layer executed by the processor(s) 912); in which case the servers are said to be co-located with the VNEs of that ND.
  • the servers are said to be co-located with the VNEs of that ND.
  • a virtual network is a logical abstraction of a physical network (such as that in Figure 9A) that provides network services (e.g., L2 and/or L3 services).
  • a virtual network can be implemented as an overlay network (sometimes referred to as a network virtualization overlay) that provides network services (e.g., layer 2 (L2, data link layer) and/or layer 3 (L3, networ layer) services) over an underlay network (e.g., an L3 network, such as an Internet Protocol (IP) network that uses tunnels (e.g., generic routing encapsulation (GRE), layer 2 tunneling protocol (L2TP), IPSec) to create the overlay network).
  • IP Internet Protocol
  • a network virtualization edge sits at the edge of the underlay network and participates in implementing the network virtualization; the network-facing side of the NVE uses the underlay network to tunnel frames to and from other NVEs; the outward-facing side of the NVE sends and receives data to and from systems outside the network.
  • a virtual network instance is a specific instance of a virtual network on an NVE (e.g., a NE/VNE on an ND, a part of a NE/VNE on a ND where that NE/VNE is divided into multiple VNEs through emulation); one or more VNIs can be instantiated on an NVE (e.g., as different VNEs on an ND).
  • a virtual access point is a logical connection point on the NVE for connecting external systems to a virtual network; a VAP can be physical or virtual ports identified through logical interface identifiers (e.g., a VLAN ID).
  • Examples of network services include: 1) an Ethernet LAN emulation service (an Ethernet-based multipoint service similar to an Internet Engineering Task Force (IETF)
  • IETF Internet Engineering Task Force
  • MPLS Multiprotocol Label Switching
  • EVPN Ethernet VPN
  • an NVE provides separate L2 VNIs (virtual switching instances) for different such virtual networks, and L3 (e.g., IP/MPLS) tunneling encapsulation across the underlay network); and 2) a virtualized IP forwarding service (similar to IETF IP VPN (e.g., Border Gateway Protocol (BGP)/MPLS IPVPN) from a service definition perspective) in which external systems are interconnected across the network by an L3 environment over the underlay network (e.g., an NVE provides separate L3 VNIs (forwarding and routing instances) for different such virtual networks, and L3 (e.g., IP/MPLS) tunneling encapsulation across the underlay network)).
  • IETF IP VPN e.g., Border Gateway Protocol (BGP)/MPLS IPVPN
  • Network services may also include quality of service capabilities (e.g., traffic classification marking, traffic conditioning and scheduling), security capabilities (e.g., filters to protect customer premises from network - originated attacks, to avoid malformed route announcements), and management capabilities (e.g., full detection and processing).
  • quality of service capabilities e.g., traffic classification marking, traffic conditioning and scheduling
  • security capabilities e.g., filters to protect customer premises from network - originated attacks, to avoid malformed route announcements
  • management capabilities e.g., full detection and processing
  • FIG. 9D illustrates a network with a single network element on each of the NDs of Figure 9A, and within this straight forward approach contrasts a traditional distributed approach (commonly used by traditional routers) with a centralized approach for maintaining reachability and forwarding information (also called network control), according to some embodiments of the invention.
  • Figure 9D illustrates network elements (NEs) 970A-H with the same connectivity as the NDs 900A-H of Figure 9A.
  • IUU121J figure 9 D illustrates that the distributed approach 972 distributes responsibility for generating the reachability and forwarding information across the NEs 970A-H; in other words, the process of neighbor discovery and topology discovery is distributed.
  • the control communication and configuration module(s) 932A-R of the ND control plane 924 typically include a reachability and forwarding information module to implement one or more routing protocols (e.g., an exterior gateway protocol such as Border Gateway Protocol (BGP), Interior Gateway Protocol(s) (IGP) (e.g., Open Shortest Path First (OSPF), Intermediate System to Intermediate System (IS-IS), Routing Information Protocol (RIP), Label Distribution Protocol (LDP), Resource Reservation Protocol (RSVP) (including RSVP-Traffic Engineering (TE): Extensions to RSVP for LSP Tunnels and Generalized Multi -Protocol Label Switching
  • Border Gateway Protocol BGP
  • IGP Interior Gateway Protocol
  • OSPF Open Shortest Path First
  • IS-IS Intermediate System to Intermediate System
  • RIP Routing Information Protocol
  • LDP Label Distribution Protocol
  • RSVP Resource Reservation Protocol
  • RSVP-Traffic Engineering TE: Extensions to RSVP for LSP Tunnels and Generalized Multi -Protocol Label Switching
  • the NEs 970A-H e.g., the processor(s) 912 executing the control communication and configuration module(s) 932A-R
  • the NEs 970A-H perform their responsibility for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) by
  • Routes and adjacencies are stored in one or more routing structures (e.g., Routing Information Base (RIB), Label Information Base (LIB), one or more adjacency structures) on the ND control plane 924.
  • the ND control plane 924 programs the ND forwarding plane 926 with information (e.g., adjacency and route information) based on the routing structure(s).
  • the ND control plane 924 programs the adjacency and route information into one or more forwarding table(s) 934A-R (e.g., Forwarding Information Base (FIB), Label Forwarding Information Base (LFIB), and one or more adjacency structures) on the ND forwarding plane 926.
  • the ND can store one or more bridging tables that are used to forward data based on the layer 2 information in that data. While the above example uses the special-purpose network device 902, the same distributed approach 972 can be implemented on the general-purpose network device 904 and the hybrid network device 906.
  • FIGD illustrates that a centralized approach 974 (also known as software defined networking (SDN)) that decouples the system that makes decisions about where traffic is sent from the underlying systems that forwards traffic to the selected destination.
  • the illustrated centralized approach 974 has the responsibility for the generation of reachability and forwarding information in a centralized control plane 976 (sometimes referred to as an SDN control module, controller, network controller, OpenFlow controller, SDN controller, control plane node, network virtualization authority, or management control entity), and thus the process of neighbor discovery and topology discovery is centralized.
  • a centralized control plane 976 sometimes referred to as an SDN control module, controller, network controller, OpenFlow controller, SDN controller, control plane node, network virtualization authority, or management control entity
  • the centralized control plane 976 has a south bound interface 982 with a data plane 980 (sometime referred to the infrastructure layer, network forwarding plane, or forwarding plane (which should not be confused with a ND forwarding plane)) that includes the NEs 970A-H (sometimes referred to as switches, forwarding elements, data plane elements, or nodes).
  • the centralized control plane 976 includes a network controller 978, which includes a centralized reachability and forwarding information module 979 that determines the reachability within the network and distributes the forwarding information to the NEs 970A-H of the data plane 980 over the south bound interface 982 (which may use the OpenFlow protocol).
  • the network intelligence is centralized in the centralized control plane 976 executing on electronic devices that are typically separate from the NDs.
  • each of the control communication and configuration module(s) 932A-R of the ND control plane 924 typically include a control agent that provides the VNE side of the south bound interface 982.
  • the ND control plane 924 (the processor(s) 912 executing the control communication and configuration module(s) 932A-R) performs its responsibility for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) through the control agent communicating with the centralized control plane 976 to receive the forwarding information (and in some cases, the reachability information) from the centralized reachability and forwarding information module 979 (it should be understood that in some embodiments of the invention, the control communication and configuration module(s) 932A-R, in addition to communicating with the centralized control plane 976, may also play some role in determining reachability and/or calculating forwarding information - albeit less so than in the case of a distributed approach; such embodiments are generally considered to fall under the centralized approach 974, but may also be considered a hybrid approach).
  • data e.g., packets
  • the control agent communicating with the centralized control plane 976 to receive the forwarding
  • the same centralized approach 974 can be implemented with the general purpose network device 904 (e.g., each of the VNE 960A-R performs its responsibility for controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) by communicating with the centralized control plane 976 to receive the forwarding information (and in some cases, the reachability information) from the centralized reachability and forwarding information module 979; it should be understood that in some embodiments of the invention, the VNEs 960A-R, in addition to communicating with the centralized control plane 976, may also play some role in determining reachability and/or calculating forwarding information - albeit less so than in the case of a distributed approach) and the hybrid network device 906.
  • the general purpose network device 904 e.g., each of the VNE 960A-R performs its responsibility for controlling how data (e.g., packets) is to be routed (e.g., the next hop for
  • NFV is able to support SDN by providing an infrastructure upon which the SDN software can be run, and NFV and SDN both aim to make use of commodity server hardware and physical switches.
  • Figure 9D also shows that the centralized control plane 976 has a north bound interface 984 to an application layer 986, in which resides application(s) 988.
  • the centralized control plane 976 has the ability to form virtual networks 992 (sometimes referred to as a logical forwarding plane, network services, or overlay networks (with the NEs 970A-H of the data plane 980 being the underlay network)) for the application(s) 988.
  • virtual networks 992 sometimes referred to as a logical forwarding plane, network services, or overlay networks (with the NEs 970A-H of the data plane 980 being the underlay network)
  • the centralized control plane 976 maintains a global view of all NDs and configured NEs/VNEs, and it maps the virtual networks to the underlying NDs efficiently (including maintaining these mappings as the physical network changes either through hardware (ND, link, or ND component) failure, addition, or removal).
  • the applications 988 include the constrained flooding manager 981 that implemented the functions described herein
  • Figure 9D shows the distributed approach 972 separate from the centralized approach 974
  • the effort of network control may be distributed differently or the two combined in certain embodiments of the invention.
  • embodiments may generally use the centralized approach (SDN) 974, but have certain functions delegated to the NEs (e.g., the distributed approach may be used to implement one or more of fault monitoring, performance monitoring, protection switching, and primitives for neighbor and/or topology discovery); or 2) embodiments of the invention may perform neighbor discovery and topology discovery via both the centralized control plane and the distributed protocols, and the results compared to raise exceptions where they do not agree.
  • SDN centralized approach
  • Such embodiments are generally considered to fall under the centralized approach 974 but may also be considered a hybrid approach.
  • Figure 9D illustrates the simple case where each of the NDs 900A-H implements a single NE 970A-H, it should be understood that the network control approaches described with reference to Figure 9D also work for networks where one or more of the
  • NDs 900A-H implement multiple VNEs (e.g., VNEs 930A-R, VNEs 960A-R, those in the hybrid network device 906).
  • the network controller 978 may also emulate the implementation of multiple VNEs in a single ND. Specifically, instead of (or in addition to) implementing multiple VNEs in a single ND, the network controller 978 may present the implementation of a VNE/NE in a single ND as multiple VNEs in the virtual networks 992 (all in the same one of the virtual network(s) 992, each in different ones of the virtual network(s) 992, or some combination).
  • the network controller 978 may cause an JND to implement a single VNE (a NE) in the underlay network, and then logically divide up the resources of that NE within the centralized control plane 976 to present different VNEs in the virtual network(s) 992 (where these different VNEs in the overlay networks are sharing the resources of the single VNE/NE implementation on the ND in the underlay network).
  • a single VNE a NE
  • the network controller 978 may cause an JND to implement a single VNE (a NE) in the underlay network, and then logically divide up the resources of that NE within the centralized control plane 976 to present different VNEs in the virtual network(s) 992 (where these different VNEs in the overlay networks are sharing the resources of the single VNE/NE implementation on the ND in the underlay network).
  • Figures 9E and 9F respectively illustrate exemplary abstractions of NEs and VNEs that the network controller 978 may present as part of different ones of the virtual networks 992.
  • Figure 9E illustrates the simple case of where each of the NDs 900A-H implements a single NE 970A-H (see Figure 9D), but the centralized control plane 976 has abstracted multiple of the NEs in different NDs (the NEs 970A-C and G-H) into (to represent) a single NE 9701 in one of the virtual network(s) 992 of Figure 9D, according to some
  • Figure 9E shows that in this virtual network, the NE 9701 is coupled to NE 970D and 970F, which are both still coupled to NE 970E.
  • Figure 9F illustrates a case where multiple VNEs (VNE 970A.1 and VNE 970H.1) are implemented on different NDs (ND 900A and ND 900H) and are coupled to each other, and where the centralized control plane 976 has abstracted these multiple VNEs such that they appear as a single VNE 970T within one of the virtual networks 992 of Figure 9D, according to some embodiments of the invention.
  • the abstraction of a NE or VNE can span multiple NDs.
  • the electronic device(s) running the centralized control plane 976 may be implemented a variety of ways (e.g., a special purpose device, a general-purpose (e.g., COTS) device, or hybrid device). These electronic device(s) would similarly include processor(s), a set or one or more physical NIs, and a non-transitory machine-readable storage medium having stored thereon the centralized control plane software.
  • Figure 10 illustrates, a general-purpose control plane device 1004 including hardware 1040 comprising a set of one or more processor(s) 1042 (which are often COTS processors) and physical NIs 1046, as well as non-transitory machine-readable storage media 1048 having stored therein centralized control plane (CCP) software 1050.
  • processor(s) 1042 which are often COTS processors
  • NIs 1046 physical NIs
  • CCP centralized control plane
  • the processor(s) 1042 typically execute software to instantiate a virtualization layer 1054 (e.g., in one embodiment the virtualization layer 1054 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 1062A-R called software containers (representing separate user spaces and also called virtualization engines, virtual private servers, or jails) that may each be used to execute a set of one or more
  • a virtualization layer 1054 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 1062A-R called software containers (representing separate user spaces and also called virtualization engines, virtual private servers, or jails) that may each be used to execute a set of one or more
  • the virtualization layer 1054 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system, and an application is run on top of a guest operating system within an instance 1062A-R called a virtual machine (which in some cases may be considered a tightly isolated form of software container) that is run by the hypervisor ;
  • an application is implemented as a unikernel, which can be generated by compiling directly with an application only a limited set of libraries (e.g., from a library operating system (LibOS) including drivers/libraries of OS services) that provide the particular OS services needed by the application, and the unikernel can run directly on hardware 1040, directly on a hypervisor represented by virtualization layer 1054 (in which case the unikernel is sometimes described as running within a LibOS virtual machine), or in a software container represented by one of instances 1062A-R).
  • LibOS library operating system
  • an instance of the CCP software 1050 (illustrated as CCP instance 1079A) is executed (e.g., within the instance 1062A) on the virtualization layer 1054.
  • the CCP instance 1079A is executed, as a unikemel or on top of a host operating system, on the“bare metal” general purpose control plane device 1004. The instantiation of the CCP instance 1079A, as well as the virtualization layer 1054 and
  • instances 1062A-R if implemented, are collectively referred to as software instance(s) 1052.
  • the CCP instance 1079A includes a network controller instance 1078.
  • the network controller instance 1078 includes a centralized reachability and forwarding information module instance 1079 (which is a middleware layer providing the context of the network controller 1078 to the operating system and communicating with the various NEs), and an CCP application layer 1080 (sometimes referred to as an application layer) over the middleware layer (providing the intelligence required for various network operations such as protocols, network situational awareness, and user - interfaces).
  • this CCP application layer 1080 within the centralized control plane 1076 works with virtual network view(s) (logical view(s) of the network) and the middleware layer provides the conversion from the virtual networks to the physical view.
  • the network controller instance 1078 implemented the constrained flooding manager 1081 that implemented the functions described herein.
  • the centralized control plane 1076 transmits relevant messages to the data plane 1080 based on CCP application layer 1080 calculations and middleware layer mapping for each flow.
  • a flow may be defined as a set of packets whose headers match a given pattern of bits; in this sense, traditional IP forwarding is also flow-based forwarding where the flows are defined by the destination IP address for example; however, in other implementations, the given pattern of bits used for a flow definition may include more fields (e.g., 10 or more) in the packet headers.
  • Different NDs/NEs/VNEs of the data plane 1080 may receive different messages, and thus different forwarding information.
  • the data plane 1080 processes these messages and programs the appropriate flow information and corresponding actions in the forwarding tables (sometime referred to as flow tables) of the appropriate NE/VNEs, and then the NEs/VNEs map incoming packets to flows represented in the forwarding tables and forward packets based on the matches in the forwarding tables.
  • Standards such as OpenFlow define the protocols used for the messages, as well as a model for processing the packets.
  • the model for processing packets includes header parsing, packet classification, and making forwarding decisions. Header parsing describes how to interpret a packet based upon a well-known set of protocols. Some protocol fields are used to build a match structure (or key) that will be used in packet classification (e.g., a first key field could be a source media access control (MAC) address, and a second key field could be a destination MAC address).
  • MAC media access control
  • Packet classification involves executing a lookup in memory to classify the packet by determining which entry (also referred to as a forwarding table entry or flow entry) in the forwarding tables best matches the packet based upon the match structure, or key, of the forwarding table entries. It is possible that many flows represented in the forwarding table entries can correspond/match to a packet; in this case the system is typically configured to determine one forwarding table entry from the many according to a defined scheme (e.g., selecting a first forwarding table entry that is matched).
  • Forwarding table entries include both a specific set of match criteria (a set of values or wildcards, or an indication of what portions of a packet should be compared to a particular value/values/wildcards, as defined by the matching capabilities - for specific fields in the packet header, or for some other packet content), and a set of one or more actions for the data plane to take on receiving a matching packet. For example, an action may be to push a header onto the packet, for the packet using a particular port, flood the packet, or simply drop the packet.
  • TCP transmission control protocol
  • an unknown packet for example, a“missed packet” or a“match- miss” as used in OpenFlow parlance
  • the packet (or a subset of the packet header and content) is typically forwarded to the centralized control plane 1076.
  • the centralized control plane 1076 will then program forwarding table entries into the data plane 1080 to accommodate packets belonging to the flow of the unknown packet. Once a specific forwarding table entry has been programmed into the data plane 1080 by the centralized control plane 1076, the next packet with matching credentials will match that forwarding table entry and take the set of actions associated with that matched entry.
  • a network interface may be physical or virtual; and in the context of IP, an interface address is an IP address assigned to a NI, be it a physical NI or virtual NI.
  • a virtual NI may be associated with a physical NI, with another virtual interface, or stand on its own (e.g., a loopback interface, a point-to-point protocol interface).
  • a NI (physical or virtual) may be numbered (a NI with an IP address) or unnumbered (a NI without an IP address).
  • a loopback interface (and its loopback address) is a specific type of virtual NI (and IP address) of a
  • IP addresses of that ND are referred to as IP addresses of that ND; at a more granular level, the IP address(es) assigned to NI(s) assigned to a NE/VNE implemented on a ND can be referred to as IP addresses of that NE/VNE.
  • Next hop selection by the routing system for a given destination may resolve to one path (that is, a routing protocol may generate one next hop on a shortest path); but if the routing system determines there are multiple viable next hops (that is, the routing protocol generated forwarding solution offers more than one next hop on a shortest path - multiple equal cost next hops), some additional criteria is used - for instance, in a connectionless network, Equal Cost Multi Path (ECMP) (also known as Equal Cost Multi Pathing, multipath forwarding and IP multipath) may be used (e.g., typical implementations use as the criteria particular header fields to ensure that the packets of a particular packet flow are always forwarded on the same next hop to preserve packet flow ordering).
  • ECMP Equal Cost Multi Path
  • a packet flow is defined as a set of packets that share an ordering constraint.
  • the set of packets in a particular TCP transfer sequence need to arrive in order, else the TCP logic will interpret the out of order delivery as congestion and slow the TCP transfer rate down.

Abstract

A network device implements a method of managing the forwarding of link state advertisements in an interior gateway protocol. The method computes a first spanning tree with a first root for a network, computes a second spanning tree with a second root for the network, the first spanning tree and the second spanning tree defining a constrained flooding topology for the network, and floods link state advertisements (LSAs) on the constrained flooding topology according to a set of forwarding rules.

Description

SPECIFICATION
LIMITED FLOODING IN DENSE GRAPHS CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Application No. 62/738,916, filed September 28, 2018, which is hereby incorporated by reference.
TECHNICAL FIELD
[0002] Embodiments of the invention relate to the field of interior gateway protocol (IGP) operation; and more specifically, to a process for reducing quantity of link state advertisements (LSAs) that are flooded in dense mesh networks.
BACKGROUND ART
[0003] A mesh network is a type of network topology. In a mesh network the
infrastructure nodes (i.e. routers, switches and similar devices) connect to one another directly. Hierarchical mesh networks are often utilized in data centers to connect the nodes of the data center, e.g., servers, switches and similar devices. In such environments, the number of nodes and interconnects can be very high, which results in a mesh network referred to as a dense mesh network.
[0004] Interior gateway protocols (IGPs) are a type of protocol used for exchanging routing information between nodes within an autonomous system (AS). This routing information can be used for routing of data using network layer protocols (e.g., the Internet Protocol (IP)). IGPs can exchange routing information using link state advertisements (LSAs) or similar messages. The LSAs enable nodes to share their local network topology and other information with other nodes in the network (e.g., within an AS). The nodes may share their adjacencies, i.e., the links to neighbor nodes within the network. These LSAs can be flooded within the network such that each node in the network can determine a complete network topology that enables the computation of routing information for the network layer protocols. These LSAs are flooded by a node when the node has a change of local topology or during similar events. The normal procedure when a node receives an LSA that it has not received before is that the receiving node refloods the LSA on all adjacencies except the one of arrival.
[0005] However, in dense mesh networks, this LSA flooding can create a large amount of redundant traffic due to the large number of nodes and the high degree of interconnectedness of the nodes. Due to the high degree of interconnectedness each node may receive the same Hooded LSA message multiple times creating significant inefficiency in the LSA distribution process; the normal process being to check a received LSA against the local routing database, and if not already there, add the LSA information to the local routing database and flood the LSA to all neighbors. When many redundant copies of the LSA are received, the overhead of processing to validate that an LSA has not already been received can be onerous.
SUMMARY
[0006] In one embodiment, a method of managing the forwarding of link state advertisements in an interior gateway protocol (IGP) is implemented. The method includes computing a first spanning tree with a first root for a network, computing a second spanning tree with a second root for the network, the first spanning tree and the second spanning tree collectively defining a constrained flooding topology for the network, and flooding link state advertisements (LSAs) on the constrained flooding topology according to a set of forwarding rules.
[0007] In another embodiment, a network device executes the method of managing the forwarding of link state advertisements in the IGP. The network device includes a non-transitory computer readable medium having stored therein a constrained flooding manager, and a processor coupled to the non-transitory computer readable medium. The processor executes the constrained flooding manager. The constrained flooding manager computes a first spanning tree with a first root for a network, computes a second spanning tree with a second root for the network, the first spanning tree and the second spanning tree defining a constrained flooding topology for the network, and floods link state advertisements (LSAs) on the constrained flooding topology according to a set of forwarding rules.
[0008] In a further embodiment, a computing device implements a plurality of virtual machines. The plurality of virtual machines implements network function virtualization (NFV), where at least one virtual machine from the plurality of virtual machines implements the method of managing the forwarding of link state advertisements in an interior gateway protocol. The computing device includes a non-transitory computer readable medium having stored therein a constrained flooding manager, and a processor coupled to the non-transitory computer readable medium. The processor executes the at least one virtual machine from the plurality of virtual machines. The virtual machine executes the constrained flooding manager. The constrained flooding manager computes a first spanning tree with a first root for a network, computes a second spanning tree with a second root for the network, the first spanning tree and the second spanning tree defining a constrained flooding topology for the network, and floods link state advertisements (LSAs) on the constrained flooding topology according to a set of forwarding rules. [0009] In one embodiment, a control plane device is in communication with a plurality of data plane nodes in a software defined networking (SDN) network. The control plane device implements the method of managing the forwarding of link state advertisements in an IGP. The control plane device includes a non-transitory computer readable medium having stored therein a constrained flooding manager, and a processor coupled to the non-transitory computer readable medium. The processor executes the constrained flooding manager. The constrained flooding manager computes a first spanning tree with a first root for a network, computes a second spanning tree with a second root for the network, the first spanning tree and the second spanning tree defining a constrained flooding topology for the network and floods link state advertisements (LSAs) on the constrained flooding topology according to a set of forwarding rules.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The invention may best be understood by referring to the following description and accompanying drawings that are used to illustrate embodiments of the invention. In the drawings:
[0011] Figure 1 is a diagram of one embodiment of a physical topology of a dense mesh network.
[0012] Figure 2 is a diagram of one embodiment of a set of spanning trees for link state advertisement (LSA) forwarding in a dense mesh network.
[0013] Figure 3 is a flowchart of one embodiment of a process for establishing constrained flooding in a network.
[0014] Figure 4A is a flowchart of one embodiment of a process for handling LSAs received at a node in the dense mesh network.
[0015] Figure 4B is a flowchart of one embodiment of a process for handling LSAs generated by local events.
[0016] Figure 5 is a diagram of one example embodiment of LSA forwarding for a subset of nodes where there is overlap between the low spanning tree and the high spanning tree.
[0017] Figure 6 is a flowchart of a process implemented by the nodes that self-identify as ‘needy’ nodes.
[0018] Figure 7 is a flowchart of a process implemented by the nodes that self-identify as ‘constellation’ nodes.
[0019] Figure 8 is a diagram of an example network topology that illustrates a special constrained flooding case. IUU2UJ figure 9 A illustrates connectivity between network devices (NDs) within an exemplary network, as well as three exemplary implementations of the NDs, according to some
embodiments of the invention.
[0021] Figure 9B illustrates an exemplary way to implement a special-purpose network device according to some embodiments of the invention.
[0022] Figure 9C illustrates various exemplary ways in which virtual network elements (VNEs) may be coupled according to some embodiments of the invention.
[0023] Figure 9D illustrates a network with a single network element (NE) on each of the NDs, and within this straight forward approach contrasts a traditional distributed approach (commonly used by traditional routers) with a centralized approach for maintaining reachability and forwarding information (also called network control), according to some embodiments of the invention.
[0024] Figure 9E illustrates the simple case of where each of the NDs implements a single NE, but a centralized control plane has abstracted multiple of the NEs in different NDs into (to represent) a single NE in one of the virtual network(s), according to some embodiments of the invention.
[0025] Figure 9F illustrates a case where multiple VNEs are implemented on different NDs and are coupled to each other, and where a centralized control plane has abstracted these multiple VNEs such that they appear as a single VNE within one of the virtual networks, according to some embodiments of the invention.
[0026] Figure 10 illustrates a general-purpose control plane device with centralized control plane (CCP) software 1050), according to some embodiments of the invention.
DETAILED DESCRIPTION
[0027] The following description describes methods and apparatus for managing link state advertisements (LSAs) for the interior gateway protocol (IGP). The embodiments provide an LSA forwarding process that reduces the quantity of LSAs that the IGP floods to the nodes in a network while ensuring that all nodes in the network receive at least one copy of every LSA introduced into the LSA forwarding process. The process provides a significant reduction in forwarded LSA traffic in dense mesh networks or similar networks. The LSA forwarding process utilizes at least two equal and diverse paths between any pair of IGP nodes in the network. In some embodiments of the LSA forwarding process two spanning trees are computed to be utilized for the LSA forwarding in the network. The embodiments of the LSA forwarding process can utilize the tie breaking algorithm used in Institute of Electrical and Electronics Engineers (IEEE) 802. laq Shortest Path Bridging as an element of construction of the LSA forwarding topology. While the embodiments are described as applied to LSA forwarding, one skilled in the art would understand that the process is applicable to similar network management scenarios.
[0028] In the following description, numerous specific details such as logic implementations, opcodes, means to specify operands, resource partitioning/sharing/duplication implementations, types and interrelationships of system components, and logic partitioning/integration choices are set forth in order to provide a more thorough understanding of the present invention. It will be appreciated, however, by one skilled in the art that the invention may be practiced without such specific details. In other instances, control structures, gate level circuits and full software instruction sequences have not been shown in detail in order not to obscure the invention. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
[0029] References in the specification to“one embodiment,”“an embodiment,”“an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
[0030] Bracketed text and blocks with dashed borders (e.g., large dashes, small dashes, dot- dash, and dots) may be used herein to illustrate optional operations that add additional features to embodiments of the invention. However, such notation should not be taken to mean that these are the only options or optional operations, and/or that blocks with solid borders are not optional in certain embodiments of the invention.
[0031] In the following description and claims, the terms“coupled” and“connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other.“Coupled” is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other.“Connected” is used to indicate the establishment of communication between two or more elements that are coupled with each other.
[0032] An electronic device stores and transmits (internally and/or with other electronic devices over a network) code (which is composed of software instructions and which is sometimes referred to as computer program code or a computer program) and/or data using machine-readable media (also called computer-readable media), such as machine-readable storage media (e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory) and machine-readable transmission media (also called a carrier) (e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, infrared signals). Thus, an electronic device (e.g., a computer) includes hardware and software, such as a set of one or more processors (e.g., wherein a processor is a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, other electronic circuitry, a combination of one or more of the preceding) coupled to one or more machine-readable storage media to store code for execution on the set of processors and/or to store data. For instance, an electronic device may include non-volatile memory containing the code since the non-volatile memory can persist code/data even when the electronic device is turned off (when power is removed), and while the electronic device is turned on that part of the code that is to be executed by the processor(s) of that electronic device is typically copied from the slower non volatile memory into volatile memory (e.g., dynamic random access memory (DRAM), static random access memory (SRAM)) of that electronic device. Typical electronic devices also include a set or one or more physical network interface(s) (NI(s)) to establish network connections (to transmit and/or receive code and/or data using propagating signals) with other electronic devices. For example, the set of physical NIs (or the set of physical NI(s) in combination with the set of processors executing code) may perform any formatting, coding, or translating to allow the electronic device to send and receive data whether over a wired and/or a wireless connection. In some embodiments, a physical NI may comprise radio circuitry capable of receiving data from other electronic devices over a wireless connection and/or sending data out to other devices via a wireless connection. This radio circuitry may include transmitted s), received s), and/or transceiver(s) suitable for radiofrequency communication. The radio circuitry may convert digital data into a radio signal having the appropriate parameters (e.g., frequency, timing, channel, bandwidth, etc.). The radio signal may then be transmitted via antennas to the appropriate recipient(s). In some embodiments, the set of physical NI(s) may comprise network interface controlled s) (NICs), also known as a network interface card, network adapter, or local area network (LAN) adapter. The NIC(s) may facilitate in connecting the electronic device to other electronic devices allowing them to communicate via wire through plugging in a cable to a physical port connected to a NIC. One or more parts of an embodiment of the invention may be implemented using different combinations of software, firmware, and/or hardware.
[0033] A network device (ND) is an electronic device that communicatively interconnects other electronic devices on the network (e.g., other network devices, end-user devices). Some network devices are“multiple services network devices” that provide support for multiple networking tunctions (e.g., routing, bridging, switching, Layer 2 aggregation, session border control, Quality of Service, and/or subscriber management), and/or provide support for multiple application services (e.g., data, voice, and video).
[0034] As used herein, a member adjacency or adjacent member node refers to an adjacency between a given node and an immediately connected node in a network topology (e.g., an IGP network topology) where both are IGP speakers. An IGP speaker is a node that implements IGP within the physical topology of the network. The constrained flooding topology is a
contiguously connected subset of the physical topology of interconnect between IGP speakers. A ‘member’ node is a node that is a part of the constrained flooding topology. A member adjacency indicates that the immediately connected node is a member node and the adjacency is part of the constrained flooding topology. It is possible also that member nodes can be connected by adjacencies that are not part of the constrained flooding topology.
[0035] A participating node is a node in the network that is an IGP speaker (i.e., a node that implements IGP) that has advertised the capability, and thus the intention, to participate in a constrained flooding topology. Similarly, a participating adjacency refers to an adjacency, i.e., an immediate link or connection, between two participating nodes. A contiguous set of participant nodes are also member nodes. A participant node may be severed from all adjacencies with other participant nodes, hence can be simultaneously a participant and non member node.
[0036] A non-participating node refers to a node in the network that is an IGP speaker but has not advertised a capability or intent to participate in a constrained flooding topology. For example, a node may be a non-participant when the capability is not implemented, or the implementation may support participation in a constrained flooding topology, but the node has been administratively configured to be a non-participant.
[0037] Overview
[0038] Constrained Flooding Topology
[0039] A constrained flooding topology is composed of a contiguously connected set of participating nodes. The constrained flooding topology can be constructed from two diversely rooted spanning trees. Thus, there are two spanning trees each with a different root in the network. Every participating node that has more than one participating adjacency is required to be bi-connected to the constrained flooding topology. This is achieved by constrained flooding topology construction. For a bipartite graph or similar hierarchical network, the resulting constrained flooding topology diameter will typically be two times the depth of the tree hierarchy. The compromise in this approach is that a subset of nodes in the network will not see a reduction of the replication burden from current practice when flooding LSAs. The degree of this subset of nodes in the constrained flooding topology, i.e., the number of adjacencies for this subset of nodes, will correspond to the degree of the physical topology, i.e., the number of adjacencies of nodes in the physical topology.
[0040] In some embodiments, the participating nodes can utilize LSAs to flood local topology information and related information to other nodes in the network. A member node may forward a received LSA to adjacent member nodes via the constrained flooding topology. Specific forwarding rules beyond that normally associated with spanning tree forwarding prevent undue flooding of the LSAs. The result of the flooding of LSAs using the constrained flooding topology is that every participant node that has more than one participating adjacency will be bi- connected to the constrained flooding topology and will receive two copies of any flooded LSA in a fault free dense mesh network. Participating nodes that are only singly connected will receive one copy as would a chain of bi-connected nodes terminating on a singly connected note. Participating nodes may be singly connected to the constrained flooding topology due to degradation of the network, e.g., a failure of a link or node or as an artifact of network design. The participating nodes implement a set of forwarding rules for handling LSAs in the constrained flooding topology. These forwarding rules are described further herein below.
[0041] Applicability of Constrained Flooding Process
[0042] A constrained flooding process can be applied to varying network topologies. These network topologies can include those networks that can be represented as pure bipartite graphs, bipartite graphs modified with the addition of intra-tier adjacencies, and hierarchical variations of the above. The advantages of the constrained flooding process will vary according to network topology with the aforementioned network topologies having discernable reductions in LSA traffic when the constrained flooding process is implemented. For sake of clarity and
conciseness the example network topologies and the graphs representing these networks are assumed to have link costs that are common for all inter-tier links and common for any intra-tier links.
[0043] The Constrained Flooding Process
[0044] The constrained flooding process constructs two spanning trees to be utilized for the forwarding of LSAs amongst participating nodes. For example, the constrained flooding process can utilize the tie breaking algorithm from the Institute of Electrical and Electronics Engineers (IEEE) 802. laq. The IEEE 802. laq shortest path bridging process for the construction of the spanning trees is used herein by way of example for spanning tree construction. In particular the process used is described in clause 28.5 of IEEE 802. laq.
[0045] A component of the IEEE 802. laq tree computation process employed in the embodiments is the tie breaking component. The IEEE 802. laq tree computation process produces a symmetrically congruent mesh of multicast trees and unicast forwarding whereby the path between any two nodes in the network is symmetric in both directions and congruent for both unicast and multicast traffic. For the constrained flooding process, the IEEE 802. laq tree computation process or a similar tree computation process is used in the generation of two diversely rooted spanning trees that define the constrained flooding topology.
[0046] As part of tree construction, the IEEE 802. laq tree computation tie breaks between equal cost paths, i.e. the IEEE 802. laq tree computation utilizes a deterministic tie breaking process to select one of a set of equal cost paths when constructing a tree, in this case a spanning tree. When a set of equal costs paths are encountered as part of a Dijkstra computation that is a part of the tree computation, a path-identifier (path-id) is constructed for each equal cost path. A path-id is expressed as a lexicographically sorted list of the node-identifiers (node-ids) for a given equal cost path. The set of equal cost paths is ranked using these path-ids, and the lowest ranking path-id and corresponding path are selected. The ranking is based on the lexicographical sorting. As an example, a path-id 23-39-44-85-98 is ranked lower than a path-id 23-44-59-90- 93. When the path-ids are of unequal length, the path-ids with the fewest hops are ranked as being superior to the longer paths, and tie breaking is applied to select between the shorter path- ids. The node-ids used would be the loopback address of each node, therefore each path-id will be unique.
[0047] The IEEE 802. laq tree computation process includes the concept of an "algorithm- mask", which is a value XOR'd with the node-ids prior to sorting into path IDs and ranking the paths. This algorithm-mask permits the construction of diverse trees in a dense topology. Two algorithm masks are used for the construction of the two diverse spanning trees used to define the constrained flooding topology (zero and -1). When computing two trees from the same root, when there are at least two nodes to choose from at each distance from the root, fully diverse trees will be generated. When computing two trees from diverse roots in a tree architecture, diverse nodes will be selected in each tier in the hierarchy as the relay nodes to the next tier. The selection of relay nodes does have implications for root selection as described further herein below.
[0048] The IEEE 802. laq tree computation process has the property of permitting the pruning of intermediate state as a Dijkstra computation progresses since equal cost path ties can be immediately evaluated, and all paths other than the selected path removed from further consideration. This is desirable when performing a Dijkstra computation in a dense graph as all path permutations do not need to be carried forward during computation. This permits the computation of the spanning tree to be quite fast despite the complexity of a dense mesh. The resulting computational complexity can be expressed as 2N(lnN), where N is a number of nodes in the network.
[0049] In some embodiments, the constrained LSA flooding process depends on tie breaking between sets of node IDs to produce diverse paths, therefore it can place some restrictions on root selection. A root can be selected so that the root’s node-id when XORd with the associated algorithm mask is the lowest ranked node in the local tier in the tree hierarchy. This would be analogous to path-id ranking where the paths were all of length 1. In some embodiments, the root is not selected such that the node-id when XORd with the other root’s algorithm mask is the lowest ranked node. This would result in the root also being a transit node for the other spanning tree and produce a scenario whereby a single failure could render both spanning trees incomplete. The embodiments can avoid roots that are directly connected for the low and high spanning trees. If the topology does not permit this to be satisfied purely by root selection, then the inter-root adjacency can be pruned from the graph prior to spanning tree computation to ensure that diverse paths between the roots are used. For a true bipartite graph, there may be no other restrictions on node selection. For a bipartite graph modified with inter-tier links, the roots can be placed in different tiers to ensure a pathological combination of link weights and node-ids does not result in a scenario where a single failure would render the constrained flooding topology incomplete. Other sources of failure may exist that can be addressed by introducing an administrative component to root selection. This, for example, would ensure that both roots were not selected from a common shared risk group.
[0050] Figure l is a diagram of an example network with a full mesh topology. The underlying full mesh of connections between the nodes is provided by way of example and not limitation. In the diagram, an example dense mesh network with 56 nodes numbered 0 to 55 is illustrated. In this example, each of the nodes in tier 1 (Nodes 48-55) is connected to each of the nodes in tier 2 (Nodes 24-47). Nodes 24-31 are connected to each of nodes 0 to 7 in tier 3. Nodes 32-39 are connected to each of nodes 8 to 15 in tier 3. Nodes 40-47 are connected to each of nodes 16 to 23 in tier 3. This organization of a dense mesh network is provided by example. A dense mesh network can include any number of nodes and have any number of tiers. The nodes in each tier are connected with a large proportion of the nodes in adjacent tiers.
[0051] Figure 2 is a diagram of one embodiment of a set of spanning trees for link state advertisement (LSA) forwarding in a dense mesh network. In this example, two spanning trees are illustrated connecting each of the nodes in the dense mesh network. The underlying full mesh connections shown in Figure 1 are not illustrated here for clarity. These two spanning trees 101 and 105 collectively form a constrained flooding topology. Spanning tree 101 is rooted 103 at node 48. The root 107 of spanning tree 105 is node 55. Thus, each node in the networ has one connection with each of the spanning trees 101 and 105. The spanning trees 101 and 105 have diverse roots and paths to minimize overlap such that a single failure of a link or node will not prevent at least one copy of an LSA to reach each node. The roots can be selected by any mechanism that permits fully diverse spanning trees to be computed using a spanning tree computation (e.g., using IEEE 802. laq). The spanning trees 101 and 105 are capable of propagating LSAs across the constrained flooding topology with a latency comparable to unconstrained flooding of LSAs in the dense mesh network.
[0052] Each node in the dense mesh topology is an IGP speaker. Each IGP speaker in the network has knowledge of each of the two spanning tree roots and the algorithm mask associated with each. Each participating IGP speaker in the network computes a spanning tree from each of the two roots (using the algorithm mask associated with each root) and from that can determine its own role in the constrained flooding topology. In one embodiment, the two spanning trees can be referred to as the "low spanning tree" and the "high spanning tree."
[0053] The operations in the flow diagrams will be described with reference to the exemplary embodiments of the other figures. However, it should be understood that the operations of the flow diagrams can be performed by embodiments of the invention other than those discussed with reference to the other figures, and the embodiments of the invention discussed with reference to these other figures can perform operations different than those discussed with reference to the flow diagrams.
[0054] Figure 3 is a flowchart of one embodiment of a process for establishing constrained LSA flooding in a network. The dense mesh network begins in a state where the topology is unstable and there are no spanning trees established for constrained LSA flooding. The topology can be unstable where there has not been sufficient time for the nodes in the network to compute routing in response to the change of links and nodes in the network advertised via LSA. The constrained flooding process can begin by initializing a set of timers (e.g., timers T1 and T2) (Block 301). T1 can represent a period of quiescence for LSA advertisements after which a first spanning tree (i.e., either the high or low spanning tree) is computed. T2 can represent a second period of quiescence for LSA advertisements after which a second spanning tree (i.e., the other of the high or low spanning trees) is computed. T1 and T2 can be selected such that one complete spanning tree is always stable. In this embodiment, T1 is less than T2.
[0055] The timers T1 and T2 can be continuously decremented over time until they expire.
The constrained LSA flooding process can await the expiration of the timers while monitoring for received or locally generated LSAs as well as similar events (Block 303). If an LSA is received from an adjacent node, then the LSA is flooded on all interfaces to all adjacent nodes except for the interface on which the LSA was received (Block 305). The constrained LSA Hooding process then resets the timers (T1 + T2) (Block 301) and awaits the next event
(Block 303). If an LSA was locally generated by the node, e.g., where the node determines a change on its interfaces with adjacent nodes, then the LSA is flooded on all interfaces of the node to all adjacent nodes in the dense mesh network (Block 307). The constrained LSA flooding process then resets the timers (T1 + T2) (Block 301) and awaits the next event
(Block 303).
[0056] If the first timer (Tl) has expired, then there has been no receipt of LSAs indicating changes in the topology of the dense mesh network during TL At this time the root of the first spanning tree is determined using any process or mechanism for determining diverse set of roots (Block 309). The second root can also be determined at this time. With the first root selected, then the node can compute the first spanning tree using the first root (Block 311). The first spanning tree can be computed using the IEEE 8021. aq process or similar spanning tree computation process.
[0057] The constrained LSA flooding process can continue to monitor for events as the second timer continues to decrement (Block 313). If an LSA is received from an adjacent node, then the LSA is flooded on all interfaces to all adjacent nodes except for the interface on which the LSA was received (Block 305). The constrained LSA flooding process then resets the timers (Tl + T2) (Block 301) and awaits the next event (Block 303). If an LSA was locally generated by the node, e.g., where the node determines a change on its interfaces with adjacent nodes, then the LSA is flooded on all interfaces of the node to all adjacent nodes in the dense mesh network (Block 307). The constrained LSA flooding process then resets the timers (Tl + T2) (Block 301) and awaits the next event (Block 303).
[0058] If the second timer (T2) has expired, then there has been no receipt of LSAs indicating changes in the topology of the dense mesh network during the time between Tl and T2. At this time, the node can compute the second spanning tree using the second root (Block 315). The second spanning tree can be computed using the IEEE 802. laq process or similar spanning tree computation process. The constrained LSA flooding process then enters a state of having established spanning trees and a stable network topology.
[0059] While in the state of having established spanning trees and a stable network topology, then the constrained LSA flooding process monitors for events including a received LSA or locally generated LSA (Block 317). If a local LSA is generated or an LSA is received, then the LSA is forwarded according to a set of rules for the constrained flooding topology (Block 319). These rules are discussed in further detail with relation to Figures 4A and 4B.
[0060] After the received or generated LSAs are flooded according to the rules of the constrained flooding topology, then the constrained LSA flooding process resets the timers, i.e., timers 1 1 and 12 (Block 321) and enters a state where the spanning trees are established, but the network topology is unstable. The constrained LSA flooding process monitors for further events (Block 323). If a local LSA is generated or an LSA is received, then the process floods the LSA according to the rules for the constrained flooding topology as discussed herein below
(Block 325). If the timer T1 expires, then the constrained LSA process starts to determine the first and second roots (Block 329) and recomputes the first spanning tree (Block 331). The process then monitors for further events (Block 333), and if the second timer expires then the constrained LSA flooding process computes the second spanning tree (Block 335) before re entering the spanning tree established and network topology stable state to await further events (Block 317). If additional LSAs are received or generated, then the constrained LSA flooding process floods LSAs according to the set of rules for the constrained flooding topology
(Block 319) and resets the timers (Block 321). The constrained LSA process stays in the spanning trees established and network topology unstable state until both timers T1 and T2 are able to expire.
[0061] The high and low spanning trees maintained by this process provide a redundant topology. Contrary to the common usage of a spanning tree, in the embodiments the distinction between upstream and downstream adjacencies between nodes is important and is an input to how a participant node further relays any LSAs that are received. Upstream member adjacencies are in the direction of a root, and downstream member adjacencies are in the direction away from the root. The reason for this is that the constrained flooding topology is the combination of the two spanning trees, they do not operate independently.
[0062] Figure 4A is a flowchart of one embodiment of a process for handling LSAs received at a node in the dense mesh network. The embodiments do not require that the flooded LSA’s protocol design be modified to include additional information. No additional information is required to associate a received LSA with a given tree, nor is such information needed. The flowchart illustrates the forwarding rules for the constrained LSA flooding process. The constrained LSA flooding process is activated in response to a node in the constrained flooding topology receiving an LSA via an adjacency (i.e., on an interface from an adjacent node in the network) (Block 401). A check is made to determine whether the received LSA has been received previously (Block 403). This check may utilize the information in the local routing table to compare with the information being advertised in the received LSA. The receiving node does not relay LSAs that it has already seen nor does it add the information to the local information store as it is already there, thus, a previously received LSA is discarded
(Block 405). [UU63J For new LSAs a check is made whether the received LSA is from a non-participant adjacency (i.e., the LSA is received from an adjacent non-participant node) (Block 407). If the LSA is received from a non-participant adjacency, then the LSA is flooded (i.e., forwarded) to all member adjacencies and non-participant adjacencies except the adjacency on which the LSA is received (i.e., the LSA is forwarded to each adjacent member node and to each adjacent non participant node except the sending node) (Block 409).
[0064] If the LSA is received on a member adjacency, then a determination is made whether the member adjacency is upstream, downstream, or both within the constrained flooding topology, where upstream is toward both roots, downstream is away from both roots, and upstream/downstream is a split between the two roots (Block 411). A new LSA received from an upstream member adjacency is relayed to all downstream member adjacencies, irrespective of which spanning tree the adjacencies are part of (i.e., the LSA is forwarded to each downstream adjacent member node) (Block 415). In addition, the LSA is flooded to each non-participant adjacency (i.e., to each adjacent non-participating node).
[0065] A new LSA received from a downstream member adjacency is flooded on all other member adjacencies exclusive of the adjacency of arrival irrespective of which spanning tree the adjacencies are part of (Block 417). In addition, the LSA is flooded on all non-participating adjacencies (i.e., the LSA is forwarded to each adjacent non-participating node).
[0066] A new LSA received from a member adjacency where upstream and downstream is ambiguous (it is an upstream member on one of the spanning trees and a downstream member on the other), is flooded on all other member adjacencies exclusive of the adjacency of arrival irrespective of which adjacency the links are part of (i.e., the LSA is forwarded to each adjacent member node) (Block 413). In addition, the LSA is flooded on all non-participating adjacencies (i.e., the LSA is forwarded to each adjacent non-participating node).
[0067] Figure 4B is a flowchart of one embodiment of a process for handling LSAs generated by local events. If the LSA to be processed by a node is generated as a local event (Block 451), rather than being received from an adjacent node, then the process floods the LSA on all member adjacencies and on all non-participating adjacencies (i.e., the LSA is forwarded to each adjacent node) (Block 453).
[0068] Node Additions
[0069] A participating node that is added to the constrained flooding topology will initially not be served by the constrained flooding topology. A participating node adjacent to that node can treat it as a non-participating node until such time as tree re-optimization has completed. At the end of tree optimization, typically two adjacent participating nodes will have member adjacencies with the new node, so the ability to flood LSAs between the new node and the constrained flooding topology will have been uninterrupted during the process.
[0070] Interactions between Participating and Non-Participating Nodes
[0071] The embodiments address nodal behaviors with respect to constraining flooding to member adjacencies. To address the scenario where the participating nodes were a subset of a larger network, it is possible to advertise the capability to participate in flood reduction. In this embodiment, each participating node uses this information to be able to identify the set of participating adjacencies and confine the spanning tree computation to the set of participating adjacencies in order to identify local set of member adjacencies.
[0072] A node that had a combination of participating and non-participating adjacencies can handle this such that for any new LSA received on a participating adjacency, in addition to the rules for member adjacencies, it would also flood the LSA on all non-participating adjacencies. For any new LSA received on a non-participating adjacency, it would flood the LSA on all member adjacencies.
[0073] Multiple flooding Domains and the Severing of Flooding Domains
[0074] In some embodiments, there can be sets of participating nodes that are not contiguously connected via participating adjacencies in a given IGP domain. For example, where a node has been incorrectly configured as a participating node but has no participating adjacencies, or a participating node or set of nodes has become severed from the constrained flooding topology but is still connected to other nodes in the network. Nodes in this set would still be able to compute a local extension of the constrained flooding topology, but it would only be useful if the set was sufficiently large that a majority of the nodes were not connected to non-participants. In some embodiments, procedures are designed to permit more than one constrained flooding topology in an IGP domain. In which case participating nodes would have to be administratively configured to associate with a constrained flooding topology instance.
[0075] Constrained Flooding Topology Re-Optimization
[0076] After a topology change, it is desirable that the constrained flooding topology remain stable until the network has stabilized. However, a single failure may render one of the spanning trees incomplete, such that a further single failure could make the constrained flooding topology incomplete. Therefore, the embodiments may include re-optimization of the constrained flooding topology after a topology change. In order to maintain complete convergence, the process may not recompute the spanning trees simultaneously. In the embodiments described above, the computations of spanning trees are separated in time. Re-optimization of the low spanning tree does not take place at the same time as re-optimization of the high spanning tree. The embodiments can reoptimize an incomplete tree first, however this would require the participating nodes to maintain a complete map of all member adjacencies so that a common determination of the most degraded spanning tree and hence the order of re-optimization could be made.
[0077] Node and Network Initialization
[0078] A participating node at power up will be not be able to establish member links until it has synchronized with the network and the network is stable in the new topology. This node can treat a power up similarly to how a topology change and network re-optimization is treated. The only difference being that it will flood all LSAs received or originated until both spanning trees have stabilized.
[0079] Loop Prevention
[0080] IEEE 802. laq included additional mechanisms to prevent looping, a reverse path forwarding check, and digest exchange across adjacencies to ensure IGP synchronization.
Routing LSAs are not relayed if they are a duplicate, therefore destructive looping cannot occur and additional mitigation mechanisms are not required.
[0081] Pathological Failure Scenarios
[0082] While in a stable fault free network with sufficient mesh density of the types considered, the constrained flooding topology used by the constrained LSA flooding process ensures that no single failure rendered both spanning trees incomplete.
[0083] In a tree network of sufficient mesh density, the only dual link failure that can render the constrained flooding topology incomplete is if a participant node has failures in both upstream member adjacencies. This can be partially mitigated if the node recognizes this scenario and reverts to flooding on all adjacencies. In the constrained LSA flooding process, surrounding participating nodes that receive the LSA on a non-member adjacency will introduce the LSA into the constrained flooding topology. A pathological scenario is the simultaneous failure of both roots. Root selection can place the roots two hops apart so there will be a constituency of participants that would observe a simultaneous failure of both upstream member adjacencies and revert to normal flooding.
[0084] Figure 5 is a diagram of one example embodiment of LSA forwarding for a subset of nodes where there is overlap between the low spanning tree and the high spanning tree. In this example illustrative case, nodes 0 and 55 are the roots of the high and low spanning trees. A sub-set of the links of the high and low spanning trees are shown to illustrate a case of overlap on links between nodes 0 and 31 and between nodes 24 and 55. In such overlap cases, the nodes originating LSAs on these links need to only send one copy of the LSA over the links whereas otherwise two LSAs would be sent with one being sent for each spanning tree. In addition, the roots can treat new LSAs on these links as upstream and flood them on their respective trees where they are the root.
[0085] Generalized Form
[0086] The embodiments provided herein above generate a constrained flooding topology using two diversely routing spanning trees. In some of the above-embodiments, the
IEEE 802. laq tie-breaking algorithm is utilized in the process of determining the diverse trees.
A generalized form without these elements can be generally applied while ensuring that all nodes that have more than one interface are diversely bi-connected to a constrained flooding topology. The generalized constrained flooding topology generation process described above produces diverse trees where most nodes are bi-connected. The additional embodiments generalize the constrained flooding topology generation process to increase the variety of topologies where the nodes will be bi-connected in the constrained flooding topology. In an entirely arbitrary physical network topology, the previously described process of constrained flooding topology generation will not necessarily result in all nodes being bi-connected to the constrained flooding topology. Nodes that are not bi-connected to a constrained flooding topology generated with the previously described process will have a physical adjacency that is an upstream member adjacency for both spanning trees. In most network topologies of reasonable size and graph density the constrained flooding topology generated herein-above will have only a small number of participant nodes that are not bi-connected to the constrained flooding topology. Nodes that have only a single upstream member adjacency to the constrained flooding topology, but that have more than one participant adjacencies are referred to herein as ‘needy nodes.’ Correcting these situations to ensure the constrained flooding topology will remain complete across any single failure in a reasonably meshed network of arbitrary topology can be accomplished by local establishment of a member adjacency between the needy node and one of the adjacent member nodes. The participant nodes immediately adjacent to the needy node can be referred to as constellation nodes. The embodiments provide as part of the process of establishing the constrained flooding topology for the needy node and the constellation nodes to independently determine their pairwise relationship and where appropriate install a local modification to the constrained flooding topology. This obviates the need for additional protocol exchange to coordinate constrained flooding topology modifications.
[0087] In the embodiments, after the computation of the pair of spanning trees each of the participant nodes in the network determines whether it has a member adjacency that is an upstream adjacency for both spanning trees that make up the constrained flooding topology. If so it is a needy node and is required to perform the needy node procedures. The participant nodes also check whether there are any adjacent needy nodes. For each adjacent needy node where the participant node is not the upstream member node in the flooding topology, the participant node will perform procedures for a constellation node. Nodes perform both checks as they may simultaneously be both constellation nodes and needy nodes.
[0088] Figure 6 is a flowchart of a process implemented by the participant nodes that self- identify as‘needy’ nodes. This process is implemented by each node that self-identifies as a needy node as described herein above and is performed excluding any modification made to the spanning trees by needy nodes processing from consideration to avoid race conditions. The node can initiate the process by determining if the node only has a single participant adjacency (Block 601), then no further action is possible, and the process completes (Block 603). The node then checks whether it is physically bi-connected in the constrained flooding topology
(Block 605). If the node is determined to be physically bi-connected to participant nodes, then the node modifies its downstream adjacency to include the role of upstream member adjacency (Block 623). If the node is not bi-connected, then a check is made to determine if there are further contiguous upstream single points of failure. Several single points of failure checks can be included in this determination. The immediately upstream node is a single point of failure for a needy node, and the needy node checks whether nodes further upstream form a contiguous chain of single points of failure (e.g., where there is a chain of physically bi-connected nodes or the node has only a single upstream adjacency in the constrained flooding topology). The node traces upstream until a node that is bi-connected to the constrained flooding topology is found or the root is reached (Block 607). When a bi-connected node is found it is identified as the single point of failure to be avoided where alternate paths from the node are sought. In some cases, this may be the root.
[0089] The process continues to select paths to the root for the computing node (i.e., the needy node). For each of the constellation nodes of the computing needy node, the process constructs a list of node IDs for the upstream path to each of the roots (Block 609). If some but not all of these paths transit the node previously identified to be the single point of failure to be avoided, then these paths are removed from further consideration unless all remaining paths transit this single point of failure (Block 611). The process then eliminates any paths from further consideration that are longer than a shortest path to the closest root (Block 613).
[0090] If there is only a single remaining path (Block 615), then that path is selected
(Block 617). If multiple possible paths remain, then the process analyzes the remaining paths to select one of the remaining paths. Path identifiers are constructed for each of the remaining paths using lists of node identifiers, which are padded to create an equal number of hops in each path where needed. These path identifiers can then be lexicographically sorted, ranked, or similarly constructed to enable a deterministic selection of one of the remaining paths (Block 619). The adjacency of the selected path is marked with the path identifier as an upstream member adjacency (Block 621).
[0091] Figure 7 is a flowchart of a process implemented by the nodes that self-identify as ‘constellation’ nodes. This process is implemented by each node that self-identifies as a constellation node as described herein above and is performed excluding any modification made to the spanning trees by the needy node processing from consideration to avoid race conditions. If the corresponding needy node is physically singly connected to the network, then no other participant nodes will self-identify as a constellation node for that needy node. Otherwise, this process is similar to that of the needy node. The constellation node can initiate the process by determining if the needy node is physically bi-connected in the constrained flooding topology (Block 701). If the needy node is determined to be bi-connected, then the constellation node is the only constellation node and the adjacency to the corresponding needy node is modified to include the role of downstream member adjacency (Block 703). If the needy node is not bi- connected, then a check is made to determine the furthest contiguously connected single points of failure. As with the process of the needy node, there are several single points of failure checks that can be included in this determination. The node can check whether the upstream node is a single point of failure, and whether nodes further upstream are single points of failure (e.g., where there is a chain of physically bi-connected nodes or the node has only a single upstream adjacency in the constrained flooding topology). The node can trace upstream until a bi- connected node is found, or the root is reached (Block 707). When a bi-connected node is found it is identified as the single point of failure to be avoided where alternate paths from the node are sought (note that this may be a root).
[0092] The process continues to identify alternate paths to the root for the needy node via the set of constellation nodes including the constellation node performing the computation. For all constellation nodes the process constructs a list of node IDs for the upstream path to each of the roots (Block 709). If some but not all of these paths transit the node that was previously identified to be the single points of failure to be avoided, then these paths are removed from further consideration (Block 711). The process then eliminates any paths from further consideration that are longer than a shortest path from the needy node to the given root
(Block 713).
[0093] If there is only a single remaining path (Block 715), then that path is selected
(Block 717). If multiple possible paths remain, then the process analyzes the remaining paths to select one of the remaining paths. Path identifiers are constructed for each of the remaining paths using lists of node identifiers, which are padded to create an equal number of hops in each path where needed. These path identifiers can then be lexicographically sorted, ranked, or similarly constructed to enable a deterministic selection of one of the remaining paths
(Block 719). If that path transits the computing constellation node, the adjacency with the needy node is marked as a upstream member adjacency (Block 721).
[0094] Adaptations for Special Cases
[0095] Figure 8 is a diagram of an example network topology that illustrates a special constrained flooding case. In this network topology there are eight participant nodes, numbered 0-3, 5-6, 8, and 10. The two spanning trees of the constrained flooding topology are rooted at participant nodes one and two. All links in this example are equal cost and only a first copy of a received LSA is reflooded. In this example, an LSA originates from node 0. If the LSA is received by node 5 first on path 0-10-5, then it will be treated as a downstream adjacency and forwarded to nodes 2 and 6. If the LSA is received by node 5 first on path 0-6-5, then it will be treated as received from an upstream adjacency and forwarding to node 10. Thus, there are different behaviors at node 5 for forwarding received LSAs, which are dependent on which path has a faster propagation of the LSA. Since the node will not forward an already forwarded LSA, two of the nodes may not receive a copy of the LSA. Thus, if the LSA were advertising, for example, a link 0-1 failure, then the constrained flooding topology would be incomplete.
[0096] The embodiments can be adapted to handle cases such that an LSA can be received on both an upstream and downstream interface in a random order. The adapted embodiment handles this case by tracking the class of interface of arrival for an LSA. The forwarding rules are adapted such that if an LSA is received that has not been seen before at a node, then the LSA is forwarded according to the class of arrival (i.e., based on upstream or downstream adjacency, where arrival on a non-member adjacency is considered to be the equivalent of a downstream arrival). If an LSA has been received before on an upstream adjacency and is received again on a downstream adjacency, then the LSA will be forwarded on the other upstream adjacencies.
[0097] Adaptations for Root Selection
[0098] The embodiments presented herein above can also be adapted for more generalized application with adaptations to the root selection process. The root selection process can be a distributed process performed by all participant nodes. The root selection performs a search of the network topology for all node pairs that are two hops apart and have at least two equal and lowest cost two hop paths between them and no equal cost paths that span less than or greater than two hops otherwise flooding topology generation can produce undesirable results. Nodes with two hops between them will have the properties that simultaneous dual root failure is visible to the transit nodes and results in a minimum flooding diameter by minimizing the worst case which would be the failure of an intra-root path. The resulting set of possible root pairs are then ranked for selection. The criteria for ranking root pairs can include maximizing a sum of the degree of connectedness of both roots, with a tie break minimizing a delta between the two roots. This option provides a reduction in the number of needy nodes in the constrained flooding topology. Other criteria can include minimizing the distance from a root to the network edge, minimizing or maximizing a spread in root identifiers, minimizing or maximizing a path cost between the roots, and selecting a pair with the lowest or highest node identifiers. Any combination or these criteria can be used to rank or similarly sort the possible root node pairs and make a selection. All participant nodes that implement the root selection will use the same criteria to implement the common ranking algorithm. In one example embodiment, the ranking or root pairs selects a root pair with minimum path cost between them. If there is more than one such root pair, then the pair with the maximum sum of connectedness is selected. If further tie breaking is needed, then the pair with the minimum spread in degree of connectedness is selected and then the pair with the lowest root id.
[0099] Constrained Flooding Topology Re-Optimization
[00100] In a generalized embodiment, the spanning trees are required to be re-optimized simultaneously rather than independently. Handling needy nodes in the generalized process utilizes both spanning trees. Thus, both are optimized at the same time to avoid artifacts. In some embodiments, root selection can be included in the re-optimization process. During periods of instability related to re-optimization, the nodes can treat LSAs received from participant non-member adjacent nodes as from downstream nodes, which are forwarded to all member adjacent nodes.
[00101] Metrics and constrained flooding topology construction
[00102] In the general case, use of a unary link cost for all links will have a number of desirable properties: FT generation will minimize the number of hops across the network where per hop processing of LSAs at each node is expected to dwarf transmission times, hence is the desirable metric to optimize for; the number of needy nodes is minimized as a unary metric tends to amplify path diversity; and root selection is simplified as the hop count is also the cost and eliminates a set of comer cases to be considered.
[00103] Architecture
[00104] Figure 9A illustrates connectivity between network devices (NDs) within an exemplary network, as well as three exemplary implementations of the NDs, according to some embodiments of the invention. Figure 9A shows NDs 900A-H, and their connectivity by way of lines between 900A-900B, 900B-900C, 900C-900D, 900D-900E, 900E-900F, 900F-900G, and 900A-900G, as well as between 900H and each of 900A, 900C, 900D, and 900G. These NDs are physical devices, and the connectivity between these NDs can be wireless or wired (often referred to as a link). An additional line extending from NDs 900A, 900E, and 900F illustrates that these JNDs act as ingress and egress points for the network (and thus, these NDs are sometimes referred to as edge NDs; while the other NDs may be called core NDs).
[00105] Two of the exemplary ND implementations in Figure 9A are: 1) a special-purpose network device 902 that uses custom application-specific integrated-circuits (ASICs) and a special-purpose operating system (OS); and 2) a general-purpose network device 904 that uses common off-the-shelf (COTS) processors and a standard OS.
[00106] The special-purpose network device 902 includes networking hardware 910 comprising a set of one or more processor(s) 912, forwarding resource(s) 914 (which typically include one or more ASICs and/or network processors), and physical network interfaces (NIs) 916 (through which network connections are made, such as those shown by the connectivity between NDs 900A-H), as well as non-transitory machine readable storage media 918 having stored therein networking software 920. During operation, the networking software 920 may be executed by the networking hardware 910 to instantiate a set of one or more networking software instance(s) 922. Each of the networking software instance(s) 922, and that part of the networking hardware 910 that executes that network software instance (be it hardware dedicated to that networking software instance and/or time slices of hardware temporally shared by that networking software instance with others of the networking software instance(s) 922), form a separate virtual network element 930A-R. Each of the virtual network element(s) (VNEs) 930A-R includes a control communication and configuration module 932A- R (sometimes referred to as a local control module or control communication module) and forwarding table(s) 934A-R, such that a given virtual network element (e.g., 930A) includes the control communication and configuration module (e.g., 932A), a set of one or more forwarding table(s) (e.g., 934A), and that portion of the networking hardware 910 that executes the virtual network element (e.g., 930A).
[00107] The special-purpose network device 902 is often physically and/or logically considered to include: 1) a ND control plane 924 (sometimes referred to as a control plane) comprising the processor(s) 912 that execute the control communication and configuration module(s) 932A-R; and 2) a ND forwarding plane 926 (sometimes referred to as a forwarding plane, a data plane, or a media plane) comprising the forwarding resource(s) 914 that utilize the forwarding table(s) 934A-R and the physical NIs 916. By way of example, where the ND is a router (or is implementing routing functionality), the ND control plane 924 (the processor(s) 912 executing the control communication and configuration module(s) 932A-R) is typically responsible for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) and storing that routing information in the forwarding table(s) 934A-R, and the ND forwarding plane 926 is responsible for receiving that data on the physical IN Is 916 and forwarding that data out the appropriate ones of the physical NIs 916 based on the forwarding table(s) 934A-R.
[00108] Figure 9B illustrates an exemplary way to implement the special-purpose network device 902 according to some embodiments of the invention. Figure 9B shows a special-purpose network device including cards 938 (typically hot pluggable). While in some embodiments the cards 938 are of two types (one or more that operate as the ND forwarding plane 926
(sometimes called line cards), and one or more that operate to implement the ND control plane 924 (sometimes called control cards)), alternative embodiments may combine
functionality onto a single card and/or include additional card types (e.g., one additional type of card is called a service card, resource card, or multi-application card). A service card can provide specialized processing (e.g., Layer 4 to Layer 7 services (e.g., firewall, Internet Protocol Security (IPsec), Secure Sockets Layer (SSL) / Transport Layer Security (TLS), Intrusion Detection System (IDS), peer-to-peer (P2P), Voice over IP (VoIP) Session Border Controller, Mobile Wireless Gateways (Gateway General Packet Radio Service (GPRS) Support Node (GGSN), Evolved Packet Core (EPC) Gateway)). By way of example, a service card may be used to terminate IPsec tunnels and execute the attendant authentication and encryption algorithms. These cards are coupled together through one or more interconnect mechanisms illustrated as backplane 936 (e.g., a first full mesh coupling the line cards and a second full mesh coupling all of the cards).
[00109] Returning to Figure 9A, the general-purpose network device 904 includes
hardware 940 comprising a set of one or more processor(s) 942 (which are often COTS processors) and physical NIs 946, as well as non-transitory machine-readable storage media 948 having stored therein software 950. During operation, the processor(s) 942 execute the software 950 to instantiate one or more sets of one or more applications 964A-R. While one embodiment does not implement virtualization, alternative embodiments may use different forms of virtualization. For example, in one such alternative embodiment the virtualization layer 954 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 962A-R called software containers that may each be used to execute one (or more) of the sets of applications 964A-R; where the multiple software containers (also called virtualization engines, virtual private servers, or jails) are user spaces (typically a virtual memory space) that are separate from each other and separate from the kernel space in which the operating system is run; and where the set of applications running in a given user space, unless explicitly allowed, cannot access the memory of the other processes. In another such alternative embodiment the virtualization layer 954 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system, and each of the sets of applications 964A-R is run on top of a guest operating system within an instance 962A-R called a virtual machine (which may in some cases be considered a tightly isolated form of software container) that is run on top of the hypervisor - the guest operating system and application may not know they are running on a virtual machine as opposed to running on a“bare metal” host electronic device, or through para-virtualization the operating system and/or application may be aware of the presence of virtualization for optimization purposes. In yet other alternative embodiments, one, some or all of the applications are implemented as unikemel(s), which can be generated by compiling directly with an application only a limited set of libraries (e.g., from a library operating system (LibOS) including drivers/libraries of OS services) that provide the particular OS services needed by the application. As a unikernel can be implemented to run directly on hardware 940, directly on a hypervisor (in which case the unikemel is sometimes described as running within a LibOS virtual machine), or in a software container, embodiments can be implemented fully with unikemels running directly on a hypervisor represented by virtualization layer 954, unikemels running within software containers represented by instances 962A-R, or as a combination of unikemels and the above-described techniques (e.g., unikemels and virtual machines both ran directly on a hypervisor, unikemels and sets of applications that are ran in different software containers).
[00110] The instantiation of the one or more sets of one or more applications 964A-R, as well as virtualization if implemented, are collectively referred to as software instance(s) 952. Each set of applications 964A-R, corresponding virtualization construct (e.g., instance 962A-R) if implemented, and that part of the hardware 940 that executes them (be it hardware dedicated to that execution and/or time slices of hardware temporally shared), forms a separate virtual network element(s) 960A-R. In some embodiments, the constrained flooding manager 964A-R is an application that implements the processes described herein above.
[00111] The virtual network element(s) 960A-R perform similar functionality to the virtual network element(s) 930A-R - e.g., similar to the control communication and configuration module(s) 932A and forwarding table(s) 934A (this virtualization of the hardware 940 is sometimes referred to as network function virtualization (NFV)). Thus, NFV may be used to consolidate many network equipment types onto industry standard high-volume server hardware, physical switches, and physical storage, which could be located in Data centers, NDs, and customer premise equipment (CPE). While embodiments of the invention are illustrated with each instance 962A-R corresponding to one VNE 960A-R, alternative embodiments may implement this correspondence at a finer level granularity (e.g., line card virtual machines virtualize line cards, control card virtual machine virtualize control cards, etc.); it should be understood that the techniques described herein with reference to a correspondence of instances 962A-R to VNEs also apply to embodiments where such a finer level of granularity and/or unikemels are used. In some embodiments, the constrained flooding manager 964A-R are implemented as part of the instances 922.
[00112] In certain embodiments, the virtualization layer 954 includes a virtual switch that provides similar forwarding services as a physical Ethernet switch. Specifically, this virtual switch forwards traffic between instances 962A-R and the physical NI(s) 946, as well as optionally between the instances 962A-R; in addition, this virtual switch may enforce network isolation between the VNEs 960A-R that by policy are not permitted to communicate with each other (e.g., by honoring virtual local area networks (VLANs)).
[00113] The third exemplary ND implementation in Figure 9A is a hybrid network device 906, which includes both custom ASICs/special-purpose OS and COTS processors/standard OS in a single ND or a single card within an ND. In certain embodiments of such a hybrid network device, a platform VM (i.e., a VM that that implements the functionality of the special-purpose network device 902) could provide for para-virtualization to the networking hardware present in the hybrid network device 906.
[00114] Regardless of the above exemplary implementations of an ND, when a single one of multiple VNEs implemented by an ND is being considered (e.g., only one of the VNEs is part of a given virtual network) or where only a single VNE is currently being implemented by an ND, the shortened term network element (NE) is sometimes used to refer to that VNE. Also, in all of the above exemplary implementations, each of the VNEs (e.g., VNE(s) 930A-R, VNEs 960A-R, and those in the hybrid network device 906) receives data on the physical NIs (e.g., 916, 946) and forwards that data out the appropriate ones of the physical NIs (e.g., 916, 946). For example, a VNE implementing IP router functionality forwards IP packets on the basis of some of the IP header information in the IP packet; where IP header information includes source IP address, destination IP address, source port, destination port (where“source port” and“destination port” refer herein to protocol ports, as opposed to physical ports of a ND), transport protocol (e.g., user datagram protocol (UDP), Transmission Control Protocol (TCP), and differentiated services code point (DSCP) values.
[00115] Figure 9C illustrates various exemplary ways in which VNEs may be coupled according to some embodiments of the invention. Figure 9C shows VNEs 970A.1-970A.P (and optionally VNEs 970A.Q-970A.R) implemented in ND 900A and VNE 970H.1 in ND 900H. In Figure 9C, VNEs 970A.1-P are separate from each other in the sense that they can receive packets from outside ND 900A and forward packets outside of ND 900A; VNE 970A.1 is coupled with VNE 970H.1, and thus they communicate packets between their respective NDs; VINh 970A.2-970A.3 may optionally forward packets between themselves without forwarding them outside of the ND 900A; and VNE 970A.P may optionally be the first in a chain of VNEs that includes VNE 970A.Q followed by VNE 970A.R (this is sometimes referred to as dynamic service chaining, where each of the VNEs in the series of VNEs provides a different service - e.g., one or more layer 4-7 network services). While Figure 9C illustrates various exemplary relationships between the VNEs, alternative embodiments may support other relationships (e.g., more/fewer VNEs, more/fewer dynamic service chains, multiple different dynamic service chains with some common VNEs and some different VNEs).
[00116] The NDs of Figure 9A, for example, may form part of the Internet or a private network; and other electronic devices (not shown; such as end user devices including workstations, laptops, netbooks, tablets, palm tops, mobile phones, smartphones, phablets, multimedia phones, Voice Over Internet Protocol (VOIP) phones, terminals, portable media players, GPS units, wearable devices, gaming systems, set-top boxes, Internet enabled household appliances) may be coupled to the network (directly or through other networks such as access networks) to communicate over the network (e.g., the Internet or virtual private networks (VPNs) overlaid on (e.g., tunneled through) the Internet) with each other (directly or through servers) and/or access content and/or services. Such content and/or services are typically provided by one or more servers (not shown) belonging to a service/content provider or one or more end user devices (not shown) participating in a peer-to-peer (P2P) service, and may include, for example, public webpages (e.g., free content, store fronts, search services), private webpages (e.g., usemame/password accessed webpages providing email services), and/or corporate networks over VPNs. For instance, end user devices may be coupled (e.g., through customer premise equipment coupled to an access network (wired or wirelessly)) to edge NDs, which are coupled (e.g., through one or more core NDs) to other edge NDs, which are coupled to electronic devices acting as servers. However, through compute and storage virtualization, one or more of the electronic devices operating as the NDs in Figure 9A may also host one or more such servers (e.g., in the case of the general purpose network device 904, one or more of the software instances 962A-R may operate as servers; the same would be true for the hybrid network device 906; in the case of the special-purpose network device 902, one or more such servers could also be run on a virtualization layer executed by the processor(s) 912); in which case the servers are said to be co-located with the VNEs of that ND.
[00117] A virtual network is a logical abstraction of a physical network (such as that in Figure 9A) that provides network services (e.g., L2 and/or L3 services). A virtual network can be implemented as an overlay network (sometimes referred to as a network virtualization overlay) that provides network services (e.g., layer 2 (L2, data link layer) and/or layer 3 (L3, networ layer) services) over an underlay network (e.g., an L3 network, such as an Internet Protocol (IP) network that uses tunnels (e.g., generic routing encapsulation (GRE), layer 2 tunneling protocol (L2TP), IPSec) to create the overlay network).
[00118] A network virtualization edge (NVE) sits at the edge of the underlay network and participates in implementing the network virtualization; the network-facing side of the NVE uses the underlay network to tunnel frames to and from other NVEs; the outward-facing side of the NVE sends and receives data to and from systems outside the network. A virtual network instance (VNI) is a specific instance of a virtual network on an NVE (e.g., a NE/VNE on an ND, a part of a NE/VNE on a ND where that NE/VNE is divided into multiple VNEs through emulation); one or more VNIs can be instantiated on an NVE (e.g., as different VNEs on an ND). A virtual access point (VAP) is a logical connection point on the NVE for connecting external systems to a virtual network; a VAP can be physical or virtual ports identified through logical interface identifiers (e.g., a VLAN ID).
[00119] Examples of network services include: 1) an Ethernet LAN emulation service (an Ethernet-based multipoint service similar to an Internet Engineering Task Force (IETF)
Multiprotocol Label Switching (MPLS) or Ethernet VPN (EVPN) service) in which external systems are interconnected across the network by a LAN environment over the underlay network (e.g., an NVE provides separate L2 VNIs (virtual switching instances) for different such virtual networks, and L3 (e.g., IP/MPLS) tunneling encapsulation across the underlay network); and 2) a virtualized IP forwarding service (similar to IETF IP VPN (e.g., Border Gateway Protocol (BGP)/MPLS IPVPN) from a service definition perspective) in which external systems are interconnected across the network by an L3 environment over the underlay network (e.g., an NVE provides separate L3 VNIs (forwarding and routing instances) for different such virtual networks, and L3 (e.g., IP/MPLS) tunneling encapsulation across the underlay network)). Network services may also include quality of service capabilities (e.g., traffic classification marking, traffic conditioning and scheduling), security capabilities (e.g., filters to protect customer premises from network - originated attacks, to avoid malformed route announcements), and management capabilities (e.g., full detection and processing).
[00120] Fig. 9D illustrates a network with a single network element on each of the NDs of Figure 9A, and within this straight forward approach contrasts a traditional distributed approach (commonly used by traditional routers) with a centralized approach for maintaining reachability and forwarding information (also called network control), according to some embodiments of the invention. Specifically, Figure 9D illustrates network elements (NEs) 970A-H with the same connectivity as the NDs 900A-H of Figure 9A. IUU121J figure 9 D illustrates that the distributed approach 972 distributes responsibility for generating the reachability and forwarding information across the NEs 970A-H; in other words, the process of neighbor discovery and topology discovery is distributed.
[00122] For example, where the special-purpose network device 902 is used, the control communication and configuration module(s) 932A-R of the ND control plane 924 typically include a reachability and forwarding information module to implement one or more routing protocols (e.g., an exterior gateway protocol such as Border Gateway Protocol (BGP), Interior Gateway Protocol(s) (IGP) (e.g., Open Shortest Path First (OSPF), Intermediate System to Intermediate System (IS-IS), Routing Information Protocol (RIP), Label Distribution Protocol (LDP), Resource Reservation Protocol (RSVP) (including RSVP-Traffic Engineering (TE): Extensions to RSVP for LSP Tunnels and Generalized Multi -Protocol Label Switching
(GMPLS) Signaling RSVP-TE)) that communicate with other NEs to exchange routes, and then selects those routes based on one or more routing metrics. Thus, the NEs 970A-H (e.g., the processor(s) 912 executing the control communication and configuration module(s) 932A-R) perform their responsibility for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) by
distributively determining the reachability within the network and calculating their respective forwarding information. Routes and adjacencies are stored in one or more routing structures (e.g., Routing Information Base (RIB), Label Information Base (LIB), one or more adjacency structures) on the ND control plane 924. The ND control plane 924 programs the ND forwarding plane 926 with information (e.g., adjacency and route information) based on the routing structure(s). For example, the ND control plane 924 programs the adjacency and route information into one or more forwarding table(s) 934A-R (e.g., Forwarding Information Base (FIB), Label Forwarding Information Base (LFIB), and one or more adjacency structures) on the ND forwarding plane 926. For layer 2 forwarding, the ND can store one or more bridging tables that are used to forward data based on the layer 2 information in that data. While the above example uses the special-purpose network device 902, the same distributed approach 972 can be implemented on the general-purpose network device 904 and the hybrid network device 906.
[00123] Figure 9D illustrates that a centralized approach 974 (also known as software defined networking (SDN)) that decouples the system that makes decisions about where traffic is sent from the underlying systems that forwards traffic to the selected destination. The illustrated centralized approach 974 has the responsibility for the generation of reachability and forwarding information in a centralized control plane 976 (sometimes referred to as an SDN control module, controller, network controller, OpenFlow controller, SDN controller, control plane node, network virtualization authority, or management control entity), and thus the process of neighbor discovery and topology discovery is centralized. The centralized control plane 976 has a south bound interface 982 with a data plane 980 (sometime referred to the infrastructure layer, network forwarding plane, or forwarding plane (which should not be confused with a ND forwarding plane)) that includes the NEs 970A-H (sometimes referred to as switches, forwarding elements, data plane elements, or nodes). The centralized control plane 976 includes a network controller 978, which includes a centralized reachability and forwarding information module 979 that determines the reachability within the network and distributes the forwarding information to the NEs 970A-H of the data plane 980 over the south bound interface 982 (which may use the OpenFlow protocol). Thus, the network intelligence is centralized in the centralized control plane 976 executing on electronic devices that are typically separate from the NDs.
[00124] For example, where the special-purpose network device 902 is used in the data plane 980, each of the control communication and configuration module(s) 932A-R of the ND control plane 924 typically include a control agent that provides the VNE side of the south bound interface 982. In this case, the ND control plane 924 (the processor(s) 912 executing the control communication and configuration module(s) 932A-R) performs its responsibility for participating in controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) through the control agent communicating with the centralized control plane 976 to receive the forwarding information (and in some cases, the reachability information) from the centralized reachability and forwarding information module 979 (it should be understood that in some embodiments of the invention, the control communication and configuration module(s) 932A-R, in addition to communicating with the centralized control plane 976, may also play some role in determining reachability and/or calculating forwarding information - albeit less so than in the case of a distributed approach; such embodiments are generally considered to fall under the centralized approach 974, but may also be considered a hybrid approach).
[00125] While the above example uses the special-purpose network device 902, the same centralized approach 974 can be implemented with the general purpose network device 904 (e.g., each of the VNE 960A-R performs its responsibility for controlling how data (e.g., packets) is to be routed (e.g., the next hop for the data and the outgoing physical NI for that data) by communicating with the centralized control plane 976 to receive the forwarding information (and in some cases, the reachability information) from the centralized reachability and forwarding information module 979; it should be understood that in some embodiments of the invention, the VNEs 960A-R, in addition to communicating with the centralized control plane 976, may also play some role in determining reachability and/or calculating forwarding information - albeit less so than in the case of a distributed approach) and the hybrid network device 906. In tact, the use of SDN techniques can enhance the NFV techniques typically used in the general-purpose network device 904 or hybrid network device 906 implementations as NFV is able to support SDN by providing an infrastructure upon which the SDN software can be run, and NFV and SDN both aim to make use of commodity server hardware and physical switches.
[00126] Figure 9D also shows that the centralized control plane 976 has a north bound interface 984 to an application layer 986, in which resides application(s) 988. The centralized control plane 976 has the ability to form virtual networks 992 (sometimes referred to as a logical forwarding plane, network services, or overlay networks (with the NEs 970A-H of the data plane 980 being the underlay network)) for the application(s) 988. Thus, the centralized control plane 976 maintains a global view of all NDs and configured NEs/VNEs, and it maps the virtual networks to the underlying NDs efficiently (including maintaining these mappings as the physical network changes either through hardware (ND, link, or ND component) failure, addition, or removal). In some embodiments, the applications 988 include the constrained flooding manager 981 that implemented the functions described herein.
[00127] While Figure 9D shows the distributed approach 972 separate from the centralized approach 974, the effort of network control may be distributed differently or the two combined in certain embodiments of the invention. For example: 1) embodiments may generally use the centralized approach (SDN) 974, but have certain functions delegated to the NEs (e.g., the distributed approach may be used to implement one or more of fault monitoring, performance monitoring, protection switching, and primitives for neighbor and/or topology discovery); or 2) embodiments of the invention may perform neighbor discovery and topology discovery via both the centralized control plane and the distributed protocols, and the results compared to raise exceptions where they do not agree. Such embodiments are generally considered to fall under the centralized approach 974 but may also be considered a hybrid approach.
[00128] While Figure 9D illustrates the simple case where each of the NDs 900A-H implements a single NE 970A-H, it should be understood that the network control approaches described with reference to Figure 9D also work for networks where one or more of the
NDs 900A-H implement multiple VNEs (e.g., VNEs 930A-R, VNEs 960A-R, those in the hybrid network device 906). Alternatively, or in addition, the network controller 978 may also emulate the implementation of multiple VNEs in a single ND. Specifically, instead of (or in addition to) implementing multiple VNEs in a single ND, the network controller 978 may present the implementation of a VNE/NE in a single ND as multiple VNEs in the virtual networks 992 (all in the same one of the virtual network(s) 992, each in different ones of the virtual network(s) 992, or some combination). For example, the network controller 978 may cause an JND to implement a single VNE (a NE) in the underlay network, and then logically divide up the resources of that NE within the centralized control plane 976 to present different VNEs in the virtual network(s) 992 (where these different VNEs in the overlay networks are sharing the resources of the single VNE/NE implementation on the ND in the underlay network).
[00129] On the other hand, Figures 9E and 9F respectively illustrate exemplary abstractions of NEs and VNEs that the network controller 978 may present as part of different ones of the virtual networks 992. Figure 9E illustrates the simple case of where each of the NDs 900A-H implements a single NE 970A-H (see Figure 9D), but the centralized control plane 976 has abstracted multiple of the NEs in different NDs (the NEs 970A-C and G-H) into (to represent) a single NE 9701 in one of the virtual network(s) 992 of Figure 9D, according to some
embodiments of the invention. Figure 9E shows that in this virtual network, the NE 9701 is coupled to NE 970D and 970F, which are both still coupled to NE 970E.
[00130] Figure 9F illustrates a case where multiple VNEs (VNE 970A.1 and VNE 970H.1) are implemented on different NDs (ND 900A and ND 900H) and are coupled to each other, and where the centralized control plane 976 has abstracted these multiple VNEs such that they appear as a single VNE 970T within one of the virtual networks 992 of Figure 9D, according to some embodiments of the invention. Thus, the abstraction of a NE or VNE can span multiple NDs.
[00131] While some embodiments of the invention implement the centralized control plane 976 as a single entity (e.g., a single instance of software running on a single electronic device), alternative embodiments may spread the functionality across multiple entities for redundancy and/or scalability purposes (e.g., multiple instances of software running on different electronic devices).
[00132] Similar to the network device implementations, the electronic device(s) running the centralized control plane 976, and thus the network controller 978 including the centralized reachability and forwarding information module 979, may be implemented a variety of ways (e.g., a special purpose device, a general-purpose (e.g., COTS) device, or hybrid device). These electronic device(s) would similarly include processor(s), a set or one or more physical NIs, and a non-transitory machine-readable storage medium having stored thereon the centralized control plane software. For instance, Figure 10 illustrates, a general-purpose control plane device 1004 including hardware 1040 comprising a set of one or more processor(s) 1042 (which are often COTS processors) and physical NIs 1046, as well as non-transitory machine-readable storage media 1048 having stored therein centralized control plane (CCP) software 1050. [UU133J In embodiments that use compute virtualization, the processor(s) 1042 typically execute software to instantiate a virtualization layer 1054 (e.g., in one embodiment the virtualization layer 1054 represents the kernel of an operating system (or a shim executing on a base operating system) that allows for the creation of multiple instances 1062A-R called software containers (representing separate user spaces and also called virtualization engines, virtual private servers, or jails) that may each be used to execute a set of one or more
applications; in another embodiment the virtualization layer 1054 represents a hypervisor (sometimes referred to as a virtual machine monitor (VMM)) or a hypervisor executing on top of a host operating system, and an application is run on top of a guest operating system within an instance 1062A-R called a virtual machine (which in some cases may be considered a tightly isolated form of software container) that is run by the hypervisor ; in another embodiment, an application is implemented as a unikernel, which can be generated by compiling directly with an application only a limited set of libraries (e.g., from a library operating system (LibOS) including drivers/libraries of OS services) that provide the particular OS services needed by the application, and the unikernel can run directly on hardware 1040, directly on a hypervisor represented by virtualization layer 1054 (in which case the unikernel is sometimes described as running within a LibOS virtual machine), or in a software container represented by one of instances 1062A-R). Again, in embodiments where compute virtualization is used, during operation an instance of the CCP software 1050 (illustrated as CCP instance 1079A) is executed (e.g., within the instance 1062A) on the virtualization layer 1054. In embodiments where compute virtualization is not used, the CCP instance 1079A is executed, as a unikemel or on top of a host operating system, on the“bare metal” general purpose control plane device 1004. The instantiation of the CCP instance 1079A, as well as the virtualization layer 1054 and
instances 1062A-R if implemented, are collectively referred to as software instance(s) 1052.
[00134] In some embodiments, the CCP instance 1079A includes a network controller instance 1078. The network controller instance 1078 includes a centralized reachability and forwarding information module instance 1079 (which is a middleware layer providing the context of the network controller 1078 to the operating system and communicating with the various NEs), and an CCP application layer 1080 (sometimes referred to as an application layer) over the middleware layer (providing the intelligence required for various network operations such as protocols, network situational awareness, and user - interfaces). At a more abstract level, this CCP application layer 1080 within the centralized control plane 1076 works with virtual network view(s) (logical view(s) of the network) and the middleware layer provides the conversion from the virtual networks to the physical view. In some embodiments, the network controller instance 1078 implemented the constrained flooding manager 1081 that implemented the functions described herein.
[00135] The centralized control plane 1076 transmits relevant messages to the data plane 1080 based on CCP application layer 1080 calculations and middleware layer mapping for each flow. A flow may be defined as a set of packets whose headers match a given pattern of bits; in this sense, traditional IP forwarding is also flow-based forwarding where the flows are defined by the destination IP address for example; however, in other implementations, the given pattern of bits used for a flow definition may include more fields (e.g., 10 or more) in the packet headers. Different NDs/NEs/VNEs of the data plane 1080 may receive different messages, and thus different forwarding information. The data plane 1080 processes these messages and programs the appropriate flow information and corresponding actions in the forwarding tables (sometime referred to as flow tables) of the appropriate NE/VNEs, and then the NEs/VNEs map incoming packets to flows represented in the forwarding tables and forward packets based on the matches in the forwarding tables.
[00136] Standards such as OpenFlow define the protocols used for the messages, as well as a model for processing the packets. The model for processing packets includes header parsing, packet classification, and making forwarding decisions. Header parsing describes how to interpret a packet based upon a well-known set of protocols. Some protocol fields are used to build a match structure (or key) that will be used in packet classification (e.g., a first key field could be a source media access control (MAC) address, and a second key field could be a destination MAC address).
[00137] Packet classification involves executing a lookup in memory to classify the packet by determining which entry (also referred to as a forwarding table entry or flow entry) in the forwarding tables best matches the packet based upon the match structure, or key, of the forwarding table entries. It is possible that many flows represented in the forwarding table entries can correspond/match to a packet; in this case the system is typically configured to determine one forwarding table entry from the many according to a defined scheme (e.g., selecting a first forwarding table entry that is matched). Forwarding table entries include both a specific set of match criteria (a set of values or wildcards, or an indication of what portions of a packet should be compared to a particular value/values/wildcards, as defined by the matching capabilities - for specific fields in the packet header, or for some other packet content), and a set of one or more actions for the data plane to take on receiving a matching packet. For example, an action may be to push a header onto the packet, for the packet using a particular port, flood the packet, or simply drop the packet. Thus, a forwarding table entry for IPv4/IPv6 packets with a particular transmission control protocol (TCP) destination port could contain an action specifying that these packets should be dropped.
[00138] Making forwarding decisions and performing actions occurs, based upon the forwarding table entry identified during packet classification, by executing the set of actions identified in the matched forwarding table entry on the packet.
[00139] However, when an unknown packet (for example, a“missed packet” or a“match- miss” as used in OpenFlow parlance) arrives at the data plane 1080, the packet (or a subset of the packet header and content) is typically forwarded to the centralized control plane 1076. The centralized control plane 1076 will then program forwarding table entries into the data plane 1080 to accommodate packets belonging to the flow of the unknown packet. Once a specific forwarding table entry has been programmed into the data plane 1080 by the centralized control plane 1076, the next packet with matching credentials will match that forwarding table entry and take the set of actions associated with that matched entry.
[00140] A network interface (NI) may be physical or virtual; and in the context of IP, an interface address is an IP address assigned to a NI, be it a physical NI or virtual NI. A virtual NI may be associated with a physical NI, with another virtual interface, or stand on its own (e.g., a loopback interface, a point-to-point protocol interface). A NI (physical or virtual) may be numbered (a NI with an IP address) or unnumbered (a NI without an IP address). A loopback interface (and its loopback address) is a specific type of virtual NI (and IP address) of a
NE/VNE (physical or virtual) often used for management purposes; where such an IP address is referred to as the nodal loopback address. The IP address(es) assigned to the NI(s) of a ND are referred to as IP addresses of that ND; at a more granular level, the IP address(es) assigned to NI(s) assigned to a NE/VNE implemented on a ND can be referred to as IP addresses of that NE/VNE.
[00141] Next hop selection by the routing system for a given destination may resolve to one path (that is, a routing protocol may generate one next hop on a shortest path); but if the routing system determines there are multiple viable next hops (that is, the routing protocol generated forwarding solution offers more than one next hop on a shortest path - multiple equal cost next hops), some additional criteria is used - for instance, in a connectionless network, Equal Cost Multi Path (ECMP) (also known as Equal Cost Multi Pathing, multipath forwarding and IP multipath) may be used (e.g., typical implementations use as the criteria particular header fields to ensure that the packets of a particular packet flow are always forwarded on the same next hop to preserve packet flow ordering). For purposes of multipath forwarding, a packet flow is defined as a set of packets that share an ordering constraint. As an example, the set of packets in a particular TCP transfer sequence need to arrive in order, else the TCP logic will interpret the out of order delivery as congestion and slow the TCP transfer rate down.
[00142] While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, can be practiced with modification and alteration within the spirit and scope of the appended claims.
The description is thus to be regarded as illustrative instead of limiting.

Claims

CLAIMS What is claimed is:
1. A method of managing forwarding of link state advertisements in a network that uses an interior gateway protocol, the method comprising:
computing (311) a first spanning tree with a first root for participant nodes in the
network;
computing (315) a second spanning tree with a second root for the participant nodes in the network, the first spanning tree and the second spanning tree defining a constrained flooding topology for the network; and
flooding link state advertisements (LSAs) (319) on the constrained flooding topology according to a set of forwarding rules.
2. The method of claim 1, further comprising:
initializing (301) a first timer and a second timer to track quiescence of LSAs; and determining (309) a root for the first spanning tree and the second spanning tree in
response to expiration of the first timer.
3. The method of claim 1, further comprising:
receiving (401) an LSA from an upstream adjacency; and
flooding (415) the LSA to all downstream member adjacencies and to all non
participating adjacent nodes, where member adjacencies are adjacencies with participating interior gateway protocol speaking nodes that are a part of the constrained flooding topology, and non-participating nodes are interior gateway protocol speakers that are not a part of the constrained flooding topology.
4. The method of claim 1, further comprising:
receiving an LSA from a downstream member adjacency or any non-member adjacency, where the LSA was not previously received on a downstream member adjacency or a non-member adjacency;
flooding the LSA on other upstream member adjacencies, where the LSA was previously received from an upstream member adjacency; and
flooding the LSA on all member adjacencies and to all non-participating adjacent nodes, exclusive of an adjacency of arrival, where the LSA was not previously received on an upstream adjacency, and where member adjacencies are adjacencies with participating interior gateway protocol speaking nodes that are a part of the constrained flooding topology, and non-participating nodes are nodes that are interior gateway speakers that are not a part of the constrained flooding topology.
5. The method of claim 1, further comprising:
determining (605) whether a node is bi-connected to the constrained flooding topology; and
in response to the determination of the node not being bi-connected, by both the node and an immediately adjacent non-upstream participating node,
tracing (607, 707) upstream from the node to identify a bi-connected node or the root,
constructing (609, 709) a set of paths with each path as a list of node identifiers for an upstream path to each root for adjacent nodes other than the upstream node,
removing (611, 711) adjacent nodes that transit the identified bi-connected node to reach either root from the set of paths where there are alternative paths, eliminating (613, 713) paths from the set of paths that are longer than a shortest path to either the first root or the second root,
selecting one path from the set of paths according to a sorting of the set of paths, and
marking an adjacency for the selected path as an upstream member adjacency between the node that is not bi-connected and an adjacent node of the selected path.
6. A network device to execute a method of managing forwarding of link state
advertisements in an interior gateway protocol, the network device comprising:
a non-transitory computer readable medium (918) having stored therein a constrained flooding manager; and
a processor (912) coupled to the non-transitory computer readable medium, the processor to execute the constrained flooding manager, the constrained flooding manager to compute a first spanning tree with a first root for a network, to compute a second spanning tree with a second root for the network, the first spanning tree and the second spanning tree defining a constrained flooding topology for the network and to flood link state advertisements (LSAs) on the constrained flooding topology according to a set of forwarding rules.
7. The network device of claim 6, wherein the constrained flooding manager is further to initialize a first timer and a second timer to track quiescence of LSAs, and wherein the constrained flooding manager is further to determine a root for the first spanning tree and the second spanning tree in response to expiration of the first timer.
8. The network device of claim 6, wherein the constrained flooding manager is to receive an LSA from an upstream adjacency, where the LSA was not previously received from an upstream member adjacency or a non-member adjacency, and to flood the LSA on all downstream member adjacencies and to all non-participating adjacent nodes, where member adjacencies are adjacencies with participating interior gateway protocol speaking nodes that are a part of the constrained flooding topology, and non-participating nodes are interior gateway speakers that are not a part of the constrained flooding topology.
9. The network device of claim 6, wherein the constrained flooding manager is to receive an LSA, where the LSA was not previously received from a downstream member adjacency or a non-member adjacency, and if previously received from an upstream member adjacency, flooding the LSA on the other upstream member adjacency, to flood the LSA on all member adjacencies and to all non-participating adjacent nodes exclusive of an adjacency of arrival, where the LSA is not previously received on an upstream adjacency, and where member adjacencies are adjacencies with participating interior gateway protocol speaking nodes that are a part of the constrained flooding topology, and non-participating nodes are interior gateway protocol speakers that are not a part of the constrained flooding topology.
10. The network device of claim 6, wherein the constrained flooding manager is further to determine whether a node is bi -connected to the constrained flooding topology, and in response to the determination of the node not being bi-connected, by both the node that is not-bi- connected and an immediately adjacent non-upstream participating node, tracing upstream from the node to identify a bi-connected node or the root, constructing a set of paths with each path as a list of node identifiers for an upstream path to each root for adjacent nodes, removing adjacent nodes that transit the identified bi-connected node to reach either root from set of paths where alternative paths exist, eliminating paths from the set of paths that are longer than a shortest path, selecting one path from the set of paths according to a sorting of the set of paths, and marking an adjacency for the selected path from the node that is not-biconnected to an adjacent participating node of a selected path as an upstream member adjacency.
11. A computing device to implement a plurality of virtual machines, the plurality of virtual machines to implement network function virtualization (NFV), where at least one virtual machine from the plurality of virtual machines implements a method of managing forwarding of link state advertisements in an interior gateway protocol, the computing device comprising: a non-transitory computer readable medium (948) having stored therein a constrained flooding manager; and
a processor (942) coupled to the non-transitory computer readable medium, the processor to execute the at least one virtual machine from the plurality of virtual machines, the virtual machine to execute the constrained flooding manager, the constrained flooding manager to compute a first spanning tree with a first root for a network, to compute a second spanning tree with a second root for the network, the first spanning tree and the second spanning tree defining a constrained flooding topology for the network and to flood link state advertisements (LS As) on the constrained flooding topology according to a set of forwarding rules.
12. The computing device of claim 11, wherein the constrained flooding manager is further to initialize a first timer and a second timer to track quiescence of LSAs, and wherein the constrained flooding manager is further to determine a root for the first spanning tree and the second spanning tree in response to expiration of the first timer.
13. The computing device of claim 11, wherein the constrained flooding manager is to receive an LSA from an upstream adjacency, where the LSA was not previously received from an upstream member adjacency or a non-member adjacency, and to flood the LSA on all downstream member adjacencies and to all non-participating adjacent nodes, where member adjacencies are adjacencies with participating interior gateway protocol speaking nodes that are a part of the constrained flooding topology, and non-participating nodes are interior gateway speakers that are not a part of the constrained flooding topology.
14. The computing device of claim 11, wherein the constrained flooding manager is to receive an LSA, where the LSA was not previously received from a downstream member adjacency or a non -member adjacency, and if previously received from an upstream member adjacency, flooding the LSA on the other upstream member adjacency, to flood the LSA on all member adjacencies and to all non-participating adjacent nodes exclusive of an adjacency of arrival, where the LSA is not previously received on an upstream adjacency, and where member adjacencies are adjacencies with participating interior gateway protocol speaking nodes that are a part of the constrained flooding topology, and non-participating nodes are interior gateway protocol speakers that are not a part of the constrained flooding topology.
15. The computing device of claim 11, wherein the constrained flooding manager is further to determine whether a node is bi-connected to the constrained flooding topology, and in response to the determination of the node not being bi-connected, by both the node that is not-bi- connected and an immediately adjacent non-upstream participating node, tracing upstream from the node to identify a bi-connected node or the root, constructing a set of paths with each path as a list of node identifiers for an upstream path to each root for adjacent nodes, removing adjacent nodes that transit the identified bi-connected node to reach either root from set of paths, eliminating paths from the set of paths that are longer than a shortest path, selecting one path from the set of paths according to a sorting of the set of paths, and the node and selected adjacent node marking an adjacency for the selected path as an upstream member adjacency.
16. A control plane device in communication with a plurality of data plane nodes in a software defined networking (SDN) network, the control plane device to implement a method of managing forwarding of link state advertisements in an interior gateway protocol, the control plane device comprising:
a non-transitory computer readable medium (948) having stored therein a constrained flooding manager; and
a processor (942) coupled to the non-transitory computer readable medium, the processor to execute the constrained flooding manager, the constrained flooding manager to compute a first spanning tree with a first root for a network, to compute a second spanning tree with a second root for the network, the first spanning tree and the second spanning tree defining a constrained flooding topology for the network to flood link state advertisements (LSAs) on the constrained flooding topology according to a set of forwarding rules, and configuring nodes in the SDN network to implement the constrained flooding topology.
17. The control plane device of claim 16, wherein the constrained flooding manager is further to initialize a first timer and a second timer to track quiescence of LSAs, and wherein the constrained flooding manager is further to determine a root for the first spanning tree and the second spanning tree in response to expiration of the first timer.
18. The control plane device of claim 16, wherein the constrained flooding manager is to receive an LSA from an upstream adjacency, where the LSA was not previously received from an upstream member adjacency or a non-member adjacency, and to flood the LSA on all downstream member adjacencies and to all non-participating adjacent nodes, where member adjacencies are adjacencies with participating interior gateway protocol speaking nodes that are a part of the constrained flooding topology, and non-participating nodes are interior gateway speakers that are not a part of the constrained flooding topology.
19. The control plane device of claim 16, wherein the constrained flooding manager is to receive an LSA, where the LSA was not previously received from a downstream member adjacency or a non -member adjacency, and if previously received from an upstream member adjacency, flooding the LSA on the other upstream member adjacency, to flood the LSA on all member adjacencies and to all non-participating adjacent nodes exclusive of an adjacency of arrival, where the LSA is not previously received on an upstream adjacency, and where member adjacencies are adjacencies with participating interior gateway protocol speaking nodes that are a part of the constrained flooding topology, and non-participating nodes are interior gateway protocol speakers that are not a part of the constrained flooding topology.
20. The control plane device of claim 16, wherein the constrained flooding manager is further to determine whether a node is bi-connected to the constrained flooding topology, and in response to the determination that the node is not bi-connected, by both the node that is not bi- connected and the immediately adjacent non-upstream participating node, tracing upstream from the node to identify a bi-connected node or the root, constructing a set of paths with each path as a list of node identifiers for an upstream path to each root for adjacent nodes, removing adjacent nodes that transit the identified bi-connected node to reach either root from set of paths, eliminating paths from the set of paths that are longer than a shortest path, selecting one path from the set of paths according to a sorting of the set of paths, and both the node and a selected adjacent node marking an adjacency for the selected path as an upstream member adjacency.
PCT/IB2019/051133 2019-02-12 2019-02-12 Limited flooding in dense graphs WO2020165627A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2019/051133 WO2020165627A1 (en) 2019-02-12 2019-02-12 Limited flooding in dense graphs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2019/051133 WO2020165627A1 (en) 2019-02-12 2019-02-12 Limited flooding in dense graphs

Publications (1)

Publication Number Publication Date
WO2020165627A1 true WO2020165627A1 (en) 2020-08-20

Family

ID=65763683

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2019/051133 WO2020165627A1 (en) 2019-02-12 2019-02-12 Limited flooding in dense graphs

Country Status (1)

Country Link
WO (1) WO2020165627A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022160666A1 (en) * 2021-01-26 2022-08-04 华为技术有限公司 Information flooding method and device
WO2022179421A1 (en) * 2021-02-25 2022-09-01 Huawei Technologies Co.,Ltd. Link state steering
CN115297045A (en) * 2022-05-10 2022-11-04 北京邮电大学 Flooding topology construction method and device for low-earth-orbit satellite network and storage medium
US11665658B1 (en) 2021-04-16 2023-05-30 Rockwell Collins, Inc. System and method for application of doppler corrections for time synchronized transmitter and receiver
US11726162B2 (en) 2021-04-16 2023-08-15 Rockwell Collins, Inc. System and method for neighbor direction and relative velocity determination via doppler nulling techniques
US11737121B2 (en) 2021-08-20 2023-08-22 Rockwell Collins, Inc. System and method to compile and distribute spatial awareness information for network
CN116827801A (en) * 2023-08-25 2023-09-29 武汉吧哒科技股份有限公司 Network topology construction method, device, computer equipment and readable storage medium
US11777844B2 (en) 2020-07-03 2023-10-03 Huawei Technologies Co., Ltd. Distributing information in communication networks
US11977173B2 (en) 2019-11-27 2024-05-07 Rockwell Collins, Inc. Spoofing and denial of service detection and protection with doppler nulling (spatial awareness)

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DAVE ALLAN ERICSSON: "A Distributed Algorithm for Constrained Flooding of IGP Advertisements; draft-allan-lsr-flooding-algorithm-00.txt", A DISTRIBUTED ALGORITHM FOR CONSTRAINED FLOODING OF IGP ADVERTISEMENTS; DRAFT-ALLAN-LSR-FLOODING-ALGORITHM-00.TXT; INTERNET-DRAFT: LSR WORKING GROUP, INTERNET ENGINEERING TASK FORCE, IETF; STANDARDWORKINGDRAFT, INTERNET SOCIETY (ISOC) 4, RUE DES FALA, 18 October 2018 (2018-10-18), pages 1 - 14, XP015129081 *
MARCEL CARIA ET AL: "Divide and Conquer: Partitioning OSPF networks with SDN", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 21 October 2014 (2014-10-21), XP081393120, DOI: 10.1109/INM.2015.7140324 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11977173B2 (en) 2019-11-27 2024-05-07 Rockwell Collins, Inc. Spoofing and denial of service detection and protection with doppler nulling (spatial awareness)
US11777844B2 (en) 2020-07-03 2023-10-03 Huawei Technologies Co., Ltd. Distributing information in communication networks
WO2022160666A1 (en) * 2021-01-26 2022-08-04 华为技术有限公司 Information flooding method and device
WO2022179421A1 (en) * 2021-02-25 2022-09-01 Huawei Technologies Co.,Ltd. Link state steering
US11757753B2 (en) 2021-02-25 2023-09-12 Huawei Technologies Co., Ltd. Link state steering
US11665658B1 (en) 2021-04-16 2023-05-30 Rockwell Collins, Inc. System and method for application of doppler corrections for time synchronized transmitter and receiver
US11726162B2 (en) 2021-04-16 2023-08-15 Rockwell Collins, Inc. System and method for neighbor direction and relative velocity determination via doppler nulling techniques
US11737121B2 (en) 2021-08-20 2023-08-22 Rockwell Collins, Inc. System and method to compile and distribute spatial awareness information for network
CN115297045A (en) * 2022-05-10 2022-11-04 北京邮电大学 Flooding topology construction method and device for low-earth-orbit satellite network and storage medium
CN116827801A (en) * 2023-08-25 2023-09-29 武汉吧哒科技股份有限公司 Network topology construction method, device, computer equipment and readable storage medium
CN116827801B (en) * 2023-08-25 2023-12-15 武汉吧哒科技股份有限公司 Network topology construction method, device, computer equipment and readable storage medium

Similar Documents

Publication Publication Date Title
EP3348025B1 (en) Multicast state reduction via tunneling in a routed system
US10523456B2 (en) Multipoint to multipoint trees for computed spring multicast
EP3400678B1 (en) Graph construction for computed spring multicast
US9628285B2 (en) Increasing failure coverage of MoFRR with dataplane notifications
US9742575B2 (en) Explicit list encoding of sparse multicast group membership information with Bit Index Explicit Replication (BIER)
WO2020165627A1 (en) Limited flooding in dense graphs
US11115328B2 (en) Efficient troubleshooting in openflow switches
US9544240B1 (en) MTU discovery over multicast path using bit indexed explicit replication
US10623300B2 (en) Method and apparatus for adaptive flow control of link-state information from link-state source to border gateway protocol (BGP)
US10904136B2 (en) Multicast distribution tree versioning for minimizing multicast group traffic disruption
WO2019239189A1 (en) Robust node failure detection mechanism for sdn controller cluster
US10721157B2 (en) Mechanism to detect data plane loops in an openflow network
WO2017168204A1 (en) Ecmp multicast over existing mpls implementations
WO2017144947A1 (en) Method and apparatus for spanning trees for computed spring multicast
WO2017144943A1 (en) Method and apparatus for congruent unicast and multicast for ethernet services in a spring network
US20220247679A1 (en) Method and apparatus for layer 2 route calculation in a route reflector network device
WO2018193285A1 (en) Method and apparatus for enabling a scalable multicast virtual private network service across a multicast label distribution protocol network using in-band signaling
WO2020255150A1 (en) Method and system to transmit broadcast, unknown unicast, or multicast (bum) traffic for multiple ethernet virtual private network (evpn) instances (evis)
WO2020100150A1 (en) Routing protocol blobs for efficient route computations and route downloads
US10944582B2 (en) Method and apparatus for enhancing multicast group membership protocol(s)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19710782

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19710782

Country of ref document: EP

Kind code of ref document: A1