WO2022078063A1 - Congestion information collection method, optimal path determination method, and network switch - Google Patents

Congestion information collection method, optimal path determination method, and network switch Download PDF

Info

Publication number
WO2022078063A1
WO2022078063A1 PCT/CN2021/113568 CN2021113568W WO2022078063A1 WO 2022078063 A1 WO2022078063 A1 WO 2022078063A1 CN 2021113568 W CN2021113568 W CN 2021113568W WO 2022078063 A1 WO2022078063 A1 WO 2022078063A1
Authority
WO
WIPO (PCT)
Prior art keywords
port
leaf switch
switch
leaf
path
Prior art date
Application number
PCT/CN2021/113568
Other languages
French (fr)
Chinese (zh)
Inventor
王领强
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2022078063A1 publication Critical patent/WO2022078063A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control

Definitions

  • the present application relates to the field of communication technologies, and in particular, to a method for collecting congestion information, a method for determining an optimal path, a network switch, and a computer-readable storage medium in a spine-and-leaf network.
  • the data center adopts fat-tree networking (fat tree topology networking), and there are a large number of paths with equal costs for communication between servers.
  • the use of multi-path load balancing technology can achieve load balancing of traffic on different paths, greatly improving network throughput and high availability.
  • a common scheme is an Equal-Cost Multipath Routing (Equal-Cost Multipath Routing, ECMP) scheme.
  • ECMP is a hop-by-hop flow-based load balancing strategy. When a router finds multiple equal-cost paths for the same destination address, it will update the routing table and add multiple rules for this destination address, corresponding to multiple next hops. When forwarding traffic, the above-mentioned equal-cost paths can be used to forward data at the same time.
  • ECMP does not have a mechanism for congestion detection, and for a path that has already been congested, it is likely to aggravate the congestion of the path.
  • SDN Software Defined Network, software-defined network
  • the principle of the scheme is that the SDN controller collects the congestion status of all the interconnected links of the switches in a unified manner, and calculates the optimal path from the TOR (Top of Rank) to the TOR in real time in combination with the congestion status. If the optimized path changes, the SDN controller delivers the forwarding table information corresponding to the latest optimized path to the relevant switch, and the switch updates the local forwarding table entry. The switch forwards the traffic according to the optimized path based on the flow.
  • the disadvantage of this scheme is that the SDN controller needs to collect the congestion information of the entire network path, the calculation amount is huge, and the time for the entire network to update the optimized path is too long.
  • the aggregation switch periodically sends a congestion detection packet and multicasts it to the core node switch.
  • the core node switch receives the congestion detection packet, on the one hand, it adds the congestion information of its own port, and sends the congestion detection packet.
  • multicast to other aggregation node switches, and finally all TOR switches obtain all network-wide congestion information.
  • the TOR switch implements calculation and updates the optimized paths to other TOR switches, and finally realizes that traffic is forwarded according to the optimized path based on the flow.
  • the switch needs to send out congestion detection packets regularly. These congestion detection packets not only occupy the network bandwidth in the data center, but also increase the calculation amount of the forwarding device.
  • the main purpose of the embodiments of the present application is to propose a method for collecting congestion information, a method for determining an optimal path, a network switch, and a computer-readable storage medium in a spine-and-leaf network.
  • a first aspect of the embodiments of the present application proposes a method for collecting congestion information in a spine-and-leaf network, including: determining a network-side port; acquiring congestion information related to the network-side port; determining a path of a first leaf switch according to a configuration policy port; insert the congestion information into the original packet according to the path port to obtain an intermediate packet; and send the intermediate packet.
  • a second aspect of the embodiments of the present application provides a method for determining an optimal path in a spine-and-leaf network, including: receiving an intermediate packet sent by a first leaf switch through a spine switch; determining that congestion information exists in the intermediate packet ; parse out the congestion information from the intermediate message; calculate the minimum congestion path according to the congestion information, and determine the minimum congestion path as the optimal path.
  • a third aspect of the embodiments of the present application provides a method for collecting congestion information in a spine-and-leaf network, including: acquiring an original packet sent by a first leaf switch from a network-side port; acquiring congestion information related to the network-side port; determining the path port of the first leaf switch; inserting the congestion information into the original packet according to the path port to obtain an intermediate packet; and sending the intermediate packet to the second leaf switch.
  • a fourth aspect of the embodiments of the present application provides a network switch for a spine-and-leaf network, including: at least one memory; at least one processor; at least one program; the program is stored in the memory, and the processor executes the at least one A program to implement: the method described in the above first aspect; or, the method described in the above second aspect.
  • a fifth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to cause a computer to execute: as described in the first aspect above The method described above; or the method described in the second aspect above; or the method described in the third aspect above.
  • FIG. 1 is a schematic diagram of an application scenario of the method for determining an optimal path for load balancing provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of another application scenario of the method for determining an optimal path for load balancing provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of another application scenario of the optimal path determination method for load balancing provided by an embodiment of the present application
  • FIG. 4 is a flowchart of an optimal path determination method for load balancing provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an application scenario of the method for determining an optimal path for load balancing provided by the first embodiment of the present application.
  • FIG. 6 is a flowchart of a method for determining an optimal path for load balancing provided by the second embodiment of the present application.
  • FIG. 7 is a schematic diagram of another application scenario of the method for determining an optimal path for load balancing provided by the second embodiment of the present application.
  • FIG. 8 is a partial flowchart of a schematic diagram of a second application scenario of the method for determining an optimal path for load balancing provided by the third embodiment of the present application.
  • FIG. 9 is a flowchart of a schematic diagram of a third application scenario of the method for determining an optimal path for load balancing provided by the fourth embodiment of the present application.
  • FIG. 10 is a flowchart of a schematic diagram of a fourth application scenario of the method for determining an optimal path for load balancing provided by the fifth embodiment of the present application.
  • FIG. 11 is a flowchart of a schematic diagram of a fifth application scenario of the method for determining an optimal path for load balancing provided by the fifth embodiment of the present application.
  • Equal-Cost Multipath Routing It is a hop-by-hop flow-based load balancing strategy. When the router finds multiple equal-cost paths for the same destination address, it will update the routing table and add multiple rules for this destination address. Corresponds to multiple next hops; when forwarding traffic, you can use equal-cost paths to forward data at the same time.
  • Fat-tree networking Also known as fat-tree topology networking.
  • Spine switch Also called spine switch, in this application, it refers to the access leaf switch.
  • Leaf switch Also called leaf switch, in this application, it refers to the switch that accesses the server.
  • Spine-leaf network structure also known as spine-leaf network structure or spine-leaf topology network structure, it is a network structure including leaf switches (connecting devices or servers) and spine nodes (connecting switches), and is an important part of the network topology of the data center.
  • Uplink The embodiment of this application refers to a link from a leaf switch to a spine switch.
  • Downlink The embodiment of this application refers to a link from a spine switch to a leaf switch.
  • LLDP Link Layer Discovery Protocol
  • the application scenario of the embodiment of the present application is a data center network.
  • multi-path load balancing technology is a technology that achieves high throughput, low latency, and high availability.
  • a common solution for implementing the multi-path load balancing technology is the ECMP solution.
  • ECMP is a hop-by-hop flow-based load balancing strategy. When a router finds multiple equal-cost paths for the same destination address, it will update the routing table and add multiple rules for this destination address, corresponding to multiple next hops.
  • ECMP does not have a mechanism for congestion detection, and for a path that has already been congested, it is likely to aggravate the congestion of the path.
  • the embodiments of the present application propose a method for collecting congestion information in a spine-leaf network, a method for determining an optimal path, a network switch, and a computer-readable storage medium, which are applied to a two-layer spine-leaf network structure.
  • the traffic is forwarded according to the optimized path to improve the throughput of the entire network.
  • the optimal path determination method for load balancing in the embodiments of the present application is described.
  • FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present application.
  • the method for collecting congestion information and the optimal path determination method in a spine-and-leaf network according to the embodiment of the present application are applied to a spine-leaf communication network (spine-leaf network), and the spine-leaf network It includes at least: a first leaf switch 101, a second leaf switch 102, and a spine switch.
  • the spine-leaf network is a two-layer network, including a spine layer and a leaf layer, the spine layer includes spine switches, the leaf layer includes leaf switches, and each leaf switch is connected to a corresponding spine switch , the same leaf switch can be connected to multiple different spine switches, and the same spine switch can also be connected to multiple different leaf switches; wherein, the first leaf switch 101 is the source leaf switch and is set to access the source server 105 ; the second leaf switch 102 is a destination leaf switch, and is set to access the destination server 106 .
  • the source server 105 is set to send packets, and the destination server 106 is set to receive packets; the packets sent by the source server 105 are sent to the destination server 106 via the first leaf switch 101, the spine switch, and the second leaf switch 102, and the destination The server 106 is arranged to receive messages sent by the second leaf switch 102 .
  • the spine switch includes a first spine switch 103 and a second spine switch 104 .
  • the first leaf switch 101 is configured to connect to the first spine switch 103
  • the second leaf switch 102 is configured to connect to the second spine switch 104 .
  • the network system may also include a third leaf switch 107 .
  • a leaf switch is a terminal device, which can include a desktop computer or a server.
  • Each switch (leaf switch or spine switch) may include a router or the like.
  • the port connected to the server is the user-side port
  • the port connected to the spine switch is the network-side interface.
  • the port of the first leaf switch 101 Port C and Port C of the second leaf switch 102 are user side ports
  • Port A and Port B of the first leaf switch 101 and Port A and Port B of the second leaf switch 102 are network side ports.
  • All ports on the spine switch 103 are network-side ports.
  • Port A, Port B, and Port C of the first spine switch 103 are network-side ports
  • Port A, Port B, and Port C of the second spine switch 103 are all network-side ports. network side port.
  • a direct connection between two switches may be called a link, and the sum of all links traversed by a packet from the source leaf switch to the destination leaf switch is a path, where all links include the source leaf switch Any intermediate switches (such as spine switches) between the switch and the destination leaf switch.
  • the uplink refers to a link from a leaf switch to a spine switch.
  • the link from Port A of the second leaf switch 102 to Port C of the second spine switch 104 is an uplink .
  • the downlink refers to a link from a spine switch to a leaf switch.
  • the link from Port A of the first spine switch 103 to Port A of the first leaf switch 101 is a downlink.
  • the end-to-end path between leaf switches refers to: from a certain network-side port (path head node) of one leaf switch to a certain network-side port (path tail node) of another leaf switch path of.
  • path head node a certain network-side port of one leaf switch
  • path tail node a certain network-side port of another leaf switch path of.
  • the path is uniquely determined by an uplink and a downlink, and can be identified by the path head node and the path tail node.
  • the path from Port A of the second leaf switch 102 to Port A of the first leaf switch 101 represents an end-to-end path from the second leaf switch 102 to the first leaf switch 101, the path consisting of an uplink (Port A of the second leaf switch 102 to Port C of the first spine switch 103) and a downlink (Port A of the first spine switch 103 to Port A of the first leaf switch 101) are uniquely determined, and the path can be used
  • the head node (Port A of the second leaf switch 102) and the trailing node (Port A of the first leaf switch 101) are uniquely identified.
  • the packet flow between a pair of leaf switches passes through the path from the source leaf switch to the spine switch, and then to the destination leaf switch.
  • the port on which the switch receives the packet is also called the ingress port, which can be used to identify the switch that sends the packet. For example, as shown in FIG. 1 , any packet on the port can be determined to be sent from the first leaf switch 101 .
  • a port may be a duplex port, that is, a single port may be used for receiving messages (ie, serving as an ingress port) and for sending messages (ie, serving as an outgoing port).
  • Congestion may occur (for example, as shown in FIG. 1 , the packet is transmitted from Port C to Port A of the first leaf switch 101). Congestion may depend on the port utilization of the switch, the transfer rate of the port, queue congestion at the port, and/or processor and memory resources, among others.
  • each leaf switch can be connected to one of the spine switches through a link; in another embodiment, each leaf switch can be connected to one of the spine switches through multiple links.
  • Figure 1 shows a link between each leaf switch and one of the spine switches.
  • FIG. 2 is a schematic diagram of another application scenario of the embodiment of the present application. Different from FIG. 1 , each leaf switch in FIG. 2 is connected to one of the spine switches through multiple links. For example, from Port A of the first leaf switch 101 to Port A of the first spine switch 103 is a link connection, and from Port B of the first leaf switch 101 to Port B of the first spine switch 103 is another link A link connection; from Port E of the first leaf switch 102 to Port A of the second spine switch 104 is a link connection, from Port F of the first leaf switch 102 to Port A of the second spine switch 104.
  • Port D is another link connection; from Port A of the second leaf switch 102 to Port C of the first spine switch 103 is a link connection, from Port B of the second leaf switch 102 to the port of the first spine switch 103 There is another link connection between Port D; there is a link connection from Port E of the second leaf switch 102 to Port C of the second spine switch 104, and from Port F of the second leaf switch 102 to the second spine Another link is connected between Port D of the switch 104.
  • FIG. 3 is a schematic diagram of another application scenario of the embodiment of the present application. Different from FIG. 1 , the optimal path determination method for load balancing shown in FIG. 3 is applied to a single network device.
  • a single network device includes multiple chips to implement the functions of the spine-leaf network structure shown in Figure 1.
  • the first leaf switch is used as the first leaf module (for example, chip 1)
  • the second leaf switch is used as the first leaf module.
  • the source server can be communicatively connected to the first panel port
  • the destination server can be communicatively connected to the second panel port.
  • FIG. 4 is a flowchart of a method for collecting congestion information in a spine-and-leaf network provided by an embodiment of the present application.
  • the method for collecting congestion information in a spine-and-leaf network in FIG. 4 at least includes steps 201 to 205 .
  • the method for collecting congestion information in the spine-and-leaf network shown in FIG. 4 is applied to the execution of the first leaf switch.
  • Step 201 determine the network side port
  • Step 202 obtaining congestion information related to the network side port
  • Step 203 Determine the path port of the first leaf switch according to the configuration policy
  • Step 204 Insert the congestion information into the original packet according to the path port to obtain an intermediate packet.
  • Step 205 Send the intermediate message.
  • the port connected to the server is the user side port
  • the port connected to the spine switch is the network side port interface
  • the congestion information includes path port information and path congestion data; and the congestion information related to the port on the network side is obtained through the congestion information calculation node.
  • the congestion information calculation node can be used to calculate the congestion information of the local network side port; the congestion information calculation node can also be used to: when the original packet passes through the congestion information calculation node, the congestion information calculation node according to the policy configured by the user, the local The calculated congestion information is inserted into the original message.
  • the congestion information calculation node may be a leaf switch; in other embodiments, the congestion information calculation node may be a spine switch.
  • the congestion information calculation node is a leaf switch
  • all leaf switches in the leaf layer are congestion information calculation nodes.
  • the first leaf switch 101 , the second leaf switch 102 , and the third leaf switch 107 All are congestion information computing nodes; and all spine switches in the spine layer are normal forwarding nodes.
  • the congestion information calculation node is a spine switch, all spine switches in the spine layer are congestion information calculation nodes.
  • the first spine switch 103 and the second spine switch 104 are congestion information calculation nodes; All leaf switches in the layer are normal forwarding nodes. Packets can be forwarded from forwarding nodes.
  • the congestion information may be the congestion information of the outgoing port of the packet, the congestion information of a network-side port of the device selected according to a certain policy, or the network-side ports of all the device.
  • the congestion information in the inbound direction is not limited in this embodiment of the present application.
  • Congestion data can include network-side port bandwidth utilization, timestamp information, or any other information that can identify port or link congestion status. In one embodiment, these congestion data may be used alone; in another embodiment, the congestion data may also be used in combination.
  • the congestion information calculation node acquires congestion information related to all network side ports of the device.
  • the congestion information calculation node if the congestion information calculation node is a leaf switch, the first leaf switch 101 executes the above steps 201 to 205 ; the spine switch acts as a normal forwarding node and performs normal packet forwarding.
  • the congestion information calculation node if the congestion information calculation node is a spine switch, the first spine switch 103 executes the method for collecting congestion information in a spine-and-leaf network of another embodiment, which will be described in detail later.
  • the second leaf switch 102 performs the following method of determining the optimal path in the spine-leaf network after receiving the intermediate message from the first spine switch 103; the first leaf switch 101 acts as a normal forwarding node and performs normal message forwarding .
  • the original message refers to a message sent by the source server to the destination server.
  • the original packet sent by the source server needs to flow through the first leaf switch, spine switch, and second leaf switch, and then forwarded to the destination server through the second leaf switch.
  • the congestion information may be inserted into any position in the original packet, and the embodiment of the present application does not limit the position at which the congestion information is inserted into the original packet.
  • Step 203 in some embodiments includes:
  • the method for collecting congestion information in a spine-and-leaf network further includes:
  • the method for collecting congestion information in a spine-and-leaf network further includes:
  • a first data packet is sent from the network side port corresponding to the timer to the second leaf switch corresponding to the timer, where the first data packet includes congestion information.
  • the method for collecting congestion information in a spine-and-leaf network further includes:
  • the congestion information is inserted into the original packet according to the ID information of the port on the network side and the ID information of the second leaf switch to obtain an intermediate packet.
  • the embodiment of the present application also provides a method for determining an optimal path in a spine-and-leaf network, the method for determining the optimal path is performed by a second leaf switch, and the method for determining the optimal path includes:
  • the minimum congestion path is calculated according to the congestion information, and the minimum congestion path is determined as the optimal path.
  • the second leaf switch after receiving the intermediate packet, parses the congestion information from the intermediate packet, and processes the parsed congestion information to obtain the original packet, path port information, and path congestion data.
  • the second leaf switch parses the congestion information inserted in the intermediate packet to obtain the congestion information of a certain downlink; the second leaf switch obtains a certain downlink according to the congestion information calculated by this switch.
  • the congestion situation of each uplink the second leaf switch updates the congestion state of the corresponding end-to-end path from the second leaf switch to the first leaf switch according to the congestion situation of the uplink and the downlink; if the congestion information
  • the computing node is a spine switch, and the second leaf switch parses the congestion information inserted in the original packet and related packet information, and obtains congestion information about the uplink and downlink; the second leaf switch updates the The congestion state of the corresponding off-path from the second leaf switch to the first leaf switch end-to-end.
  • the congestion state table updated by the second leaf switch can be referred to as shown in Table 1.
  • the column coordinates of Table 1 represent all leaf switches in the network (for example, the first leaf switch 101, the second leaf switch 102, and the third leaf switch 107), and the abscissa of Table 1 represents the current leaf switch (for example, the first leaf switch 101)
  • the path can be uniquely identified by a path head node (eg, Port A of the first leaf switch 101 ) and a trailing node (eg, Port A of the second leaf switch 102 ).
  • the congestion state of a path can be represented by the port utilization in the outbound direction of the first node of the path and the port utilization in the inbound direction of the trailing node.
  • the maximum value of the utilization ratio of the two ports is selected as the congestion information of the entire path. For example, in FIG. 5 , for a path from the second leaf switch 102 to the first leaf switch 101 , the first node of the path is Port A of the second leaf switch 102 , and the tail node of the path is the first node of the leaf switch 101 . Port A.
  • the bandwidth utilization rate of Port A (the outgoing direction of the path head node) of the second leaf switch 102 is 0.3
  • the bandwidth utilization rate of Port A (the inbound direction of the trailing node) of the first leaf switch 101 is 0.7
  • the overall The bandwidth utilization rate is 0.7, that is, the maximum value of 0.3 and 0.7 is 0.7.
  • the first leaf switch when the first leaf switch sends the packet to the second leaf switch, the first leaf switch may select the optimal path for forwarding based on the flowlet flow.
  • the second leaf switch updates the congestion information of one of the paths from the switch (this device, that is, the second leaf switch) to the end-to-end of the first leaf switch, and the congestion information of this path is finally saved to the local congestion in the status table.
  • the method of determining the optimal path further includes:
  • a second data packet is sent from the network side port corresponding to the timer to the second leaf switch corresponding to the timer, wherein the second data packet does not include congestion information.
  • the method for determining the optimal path provided by the embodiment of the present application further includes:
  • Delete the parsed congestion information to obtain the original packet. Specifically, by stripping the inserted congestion information, that is, deleting the inserted congestion information, the original packet is finally obtained, and the original packet is sent to the second leaf switch, so as to forward the original packet to the destination server through the second leaf switch , so that the destination server can receive the original message.
  • the method for collecting congestion information in a spine-and-leaf network includes:
  • all leaf switches will finally obtain the congestion information of the end-to-end related paths from the node to other leaf switches, and calculate the number of The least congested path for other leaf switches.
  • the first leaf switch may select a path with the least congestion for forwarding based on the flowlet flow.
  • only the congestion information calculation node (leaf switch or spine switch) needs to enable the congestion information calculation function, and this function does not need to be deployed on the entire network, which reduces the deployment cost and difficulty; the entire packet transmission process, leaf The switch only needs to insert the relevant congestion information into the original packet.
  • the congestion information is updated faster, and the effect of traffic load balancing is more obvious.
  • FIG. 6 is a flowchart of the first application scenario of the method for collecting congestion information and the method for determining an optimal path provided by an embodiment of the present application.
  • a leaf switch is used as a congestion information calculation node, and bandwidth utilization is used as a path.
  • the criteria for judging the congestion situation are illustrated by an example.
  • the method in FIG. 6 includes steps 301 to 311 .
  • the source server 105 is configured to send the message to the destination server 106 .
  • All leaf switches are congestion information calculation nodes, that is, the first leaf switch 101, the second leaf switch 102, and the third leaf switch 107 are congestion information calculation nodes. All leaf switches need to enable the congestion information calculation function, that is, the first leaf switch The switch 101, the second leaf switch 102, and the third leaf switch 107 need to enable the congestion information calculation function; the spine switch only serves as a forwarding node, and the congestion information calculation function may not be enabled, that is, the first spine switch 103 and the second spine switch 104 may not be enabled. The congestion information calculation function is enabled, and only normal packet forwarding is required.
  • Step 301 The source server sends an original packet.
  • Step 302 The first leaf switch determines whether the port is a network side port; if it is determined to be a network side port, then step 303 is performed; otherwise, step 304 is performed.
  • Step 303 The first leaf switch inserts the congestion information to obtain an intermediate packet.
  • Step 304 The first leaf switch normally forwards the original packet.
  • Step 305 the spine switch normally forwards the intermediate packet; specifically, the spine switch normally forwards the intermediate packet to the second leaf switch.
  • Step 306 The second leaf switch determines whether there is congestion information in the intermediate packet. If it is determined that there is congestion information in the intermediate packet, step 307 is performed; otherwise, step 311 is performed; After forwarding the intermediate packets, determine whether there is congestion information in the intermediate packets.
  • Step 307 the second leaf switch parses out the congestion information; specifically, after the second leaf switch parses out the congestion information, obtains the ID information of the first leaf switch, the path port information of the first leaf switch and the related path congestion data, the second leaf switch ID information of the switch, path port information of the second leaf switch, and related path congestion data.
  • Step 308 The second leaf switch calculates the congestion information of one of the end-to-end paths from the second leaf switch to the first leaf switch, and updates it into the local congestion state table.
  • Step 309 Delete the congestion information, obtain the original packet, and forward the original packet to the destination server.
  • Step 310 The destination server receives the original message.
  • Step 311 the second leaf switch normally forwards the intermediate packet.
  • users can enable the congestion information calculation function through configuration on the leaf switch.
  • the congestion information is related to the network-side ports of the leaf switch. Users can statically configure or enable protocols (such as the LLDP protocol, etc.) to find out which interfaces are network-side interfaces on the leaf switch.
  • the congestion information may be network-side port bandwidth utilization information (including outbound direction or inbound direction) or any other information that can identify a port or link congestion state.
  • the source service 105 sends the original message to the destination server 106 .
  • the original packet is assumed to be forwarded along the following path: Port C of the first leaf switch 101 -> Port A of the first leaf switch 101 -> Port A of the first spine switch 103 -> the first Port C of the spine switch 103 -> Port A of the second leaf switch 102 -> Port C of the second leaf switch 102.
  • the network side port is determined through step 302 .
  • the first leaf switch determines that the original packet needs to be sent from the network side port, and at this time, the first leaf switch needs to insert the congestion information into the original packet.
  • the first leaf switch finds that the outgoing port of the original packet is Port A, and Port A is a network-side port, then the first leaf switch needs to insert the congestion information about the path port of the first switch into the original packet. in the message.
  • the path port of the first switch refers to a port related to congestion information that needs to be inserted into the first leaf switch 101. For example, as shown in FIG. 5, if the first leaf switch 101 needs to insert the congestion information of Port A, in this case, Port A is the path port of the first leaf switch, and Port B is not the path port of the first leaf switch.
  • Step 303 includes: acquiring congestion information related to the network side port; the network side port is determined in step 302 .
  • Step 303 further includes: determining the path port according to the configuration policy. Specifically, on the first leaf switch, the user can configure a related policy, and determine the path port of the first leaf switch according to the corresponding configuration policy.
  • the first configuration strategy may be: the outgoing port of the packet may be selected as the path port of the first leaf switch.
  • the outgoing port of the packet of the first leaf switch 101 is Port A, and we can use Port A as the path port of the first leaf switch.
  • the second configuration strategy may be: a certain network-side port of the device may be selected as the path port of the first leaf switch in a preset manner, such as a polling manner.
  • the first leaf switch 101 may select a certain network side port of itself as a path port of the first leaf switch in a polling manner.
  • Port A is selected as the path port of the first leaf switch when the first packet arrives, and can be used when the second packet arrives.
  • Select Port B as the path port of the first leaf switch. Because the first leaf switch 101 has only two network-side ports, Port A and Port B, the current round of port polling ends. If the third packet arrives, the first leaf switch 101 reselects Port A as the path port of the first leaf switch, and so on.
  • the third configuration strategy may be: all network-side ports of the first leaf switch 101 may be used as path ports of the first leaf switch at the same time.
  • the first leaf switch 101 can use all its own network side ports (Port A and Port B) as the path ports of the first leaf switch at the same time.
  • the first leaf switch 101 After the first leaf switch 101 determines the configuration policy, it needs to insert the congestion information related to the path port of the first leaf switch into the original packet.
  • a path port of the first leaf switch corresponds to a piece of congestion information. If there are multiple path ports of the first leaf switch, multiple pieces of congestion information need to be inserted. This embodiment of the present application does not limit the amount of congestion information. Each piece of congestion information can optionally be inserted anywhere in the packet.
  • Each piece of congestion information can describe one attribute of the path port of the first leaf switch (such as the bandwidth utilization in the inbound direction of the port) or multiple attributes (for example, the bandwidth utilization information in the outgoing direction or the inbound direction of the port in both directions) , timestamp information, etc.).
  • the attributes that the congestion information can describe include: ID information of the first leaf switch, where the ID information includes an ID number, and the ID number uniquely identifies the first leaf switch 101 .
  • the attributes that can be described by the congestion information further include: ID information of the path port of the first leaf switch; the ID information uniquely identifies the port on the first leaf switch 101 .
  • the attributes that can be described by the congestion information also include: congestion attribute information; the congestion attribute information refers to one or more congestion attributes related to the path port on the first leaf switch, for example: the bandwidth utilization in the inbound direction of the port is the first Congestion attribute, the timestamp information of the original packet entering the first leaf switch 101 is the second congestion attribute.
  • the congestion attribute must include the bandwidth utilization in the inbound direction of the path port of the first leaf switch.
  • the bandwidth utilization information indicates the congestion state of a certain downlink from the spine switch to the leaf switch.
  • the congestion attribute of the port includes the bandwidth utilization in the inbound direction, and the bandwidth utilization represents the Port A of the first spine 103 switch to the Port of the first leaf switch 101 Congestion of this downlink of A.
  • the intermediate packet is sent from Port A of the first leaf switch 101.
  • the processing procedure of the first leaf switch is described.
  • the first leaf switch 101 receives a packet from Port C of the device, and sends a packet from Port A of the device.
  • the first leaf switch 101 finds that the packet is sent from a network-side port (Port A).
  • the first leaf switch 101 determines through a locally configured policy that Port A is the path port of the first leaf switch, and the first leaf switch 101 needs to insert the congestion information related to Port A into the original packet.
  • the content of the congestion information inserted by the first leaf switch 101 may include: the ID number of the first leaf switch.
  • the content of the congestion information inserted by the first leaf switch 101 may further include: path port number information of the first leaf switch, for example, the ID number of Port A of the first leaf switch 101. Further, the content of the congestion information inserted by the first leaf switch 101 may also include: congestion attributes related to the path ports of the first leaf switch, such as the bandwidth utilization in the inbound direction of Port A of the first leaf switch 101, the bandwidth utilization representing Congestion of the downlink from Port A of the first spine switch 103 to Port A of the first leaf switch 101.
  • the first spine switch 103 receives the intermediate packet from Port A of the device, performs normal forwarding processing, and sends the intermediate packet from Port C of the device.
  • the second leaf switch 102 receives the intermediate message from Port A of the device; identifies the information of the intermediate message, and if it is found that the intermediate message contains 1 or more pieces of congestion information inserted, execute Step 307.
  • the congestion information is inserted by the first leaf switch 101, and the second leaf switch 103 parses the congestion information.
  • step 307 the second leaf switch 102 parses out the congestion information; specifically, after the second leaf switch parses out the congestion information, it obtains the ID information of the first leaf switch, the path port information of the first leaf switch, and related path congestion data, ID information of the second leaf switch, path port information of the second leaf switch, and related path congestion data. Specifically, step 307 includes:
  • Obtain the congestion attribute related to the port of the first switch in this example, obtain the bandwidth utilization information of the port related to the first leaf switch.
  • Step 307 also includes:
  • the port number is the ID number of Port A of the second leaf switch 102 switch;
  • the locally calculated bandwidth utilization information of the port in the outbound direction is obtained.
  • the bandwidth utilization information identifies the congestion state of a downlink from leaf to spine. For example, in Figure 3, the bandwidth utilization indicates the congestion of the uplink from leaf 3 switch Port A to spine 1 switch Port C.
  • step 308 compare the bandwidth utilization information of the inbound direction of the path port of the first leaf switch with the bandwidth utilization information of the outbound direction of the path port of the second leaf switch, obtain the maximum value (the value indicating the most serious congestion), and set the value As the congestion value of a certain path from the second leaf switch 103 to the first leaf switch 101 .
  • the path is uniquely determined by the path port (path head node) of the second leaf switch and the path port (path tail node) of the first leaf switch;
  • the path can be uniquely identified by a path head node (eg, Port A of the first leaf switch 101 ) and a trailing node (eg, Port A of the second leaf switch 102 ).
  • the congestion state of a path can be represented by the port utilization in the outbound direction of the first node of the path and the port utilization in the inbound direction of the trailing node.
  • the maximum value of the utilization ratio of the two ports is selected as the congestion information of the entire path. For example, in FIG. 5 , for a path from the second leaf switch 102 to the first leaf switch 101 , the first node of the path is Port A of the second leaf switch 102 , and the tail node of the path is the first node of the leaf switch 101 . Port A.
  • the bandwidth utilization rate of Port A (the outgoing direction of the path head node) of the second leaf switch 102 is 0.3
  • the bandwidth utilization rate of Port A (the inbound direction of the trailing node) of the first leaf switch 101 is 0.7
  • the overall The bandwidth utilization rate is 0.7, that is, the maximum value of 0.3 and 0.7 is 0.7.
  • the second leaf switch 102 updates the related path information. If there are multiple pieces of congestion information, the second leaf switch 102 may analyze the congestion information piece by piece according to the above steps, and finally obtain multiple end-to-end paths from the second leaf switch 102 to the first leaf switch 101 . The state information of the above-mentioned multiple paths is updated into the local path congestion state table.
  • the second leaf switch removes the congestion information in the intermediate message to delete the congestion information, thereby obtaining the original message, and forwarding the original message to the destination server through Port C, so that the destination server 106 can receive original message.
  • the processing procedure of the second leaf switch 102 is exemplified.
  • the second leaf switch 102 receives an intermediate message from Port A, and sends an original message to delete congestion information from Port C.
  • the second leaf switch 102 identifies the relevant information of the intermediate packet, and parses the congestion information if one or more pieces of congestion information are inserted into the intermediate packet, specifically: the second leaf switch 102 parses the congestion information, and the second leaf switch 102 parses the congestion information,
  • the leaf switch ID, the ID value of Port A, and the bandwidth utilization information in the inbound direction of Port A are assumed to be 0.3; the second leaf switch 102 obtains its own switch ID, the ID value of Port A, and Port A (packet inbound port) ) bandwidth utilization information of the outbound direction, assuming that the value is 0.7;
  • the second leaf switch 102 calculates the congestion value of a certain path from the second leaf switch 102 to the first leaf switch 101: compare Port A of the second leaf switch 102 The bandwidth utilization information (0.7) in the out
  • the path head node of the path is Port A of the second leaf switch 102, and the trailing node of the path is Port A of the first leaf switch 101; the second leaf switch 102 updates the local path congestion state table: according to the first leaf switch ID information, The ID information of Port A, the ID information of the second leaf switch, and the ID information of Port A of the second leaf switch 102 can find the relevant path entry in the local path congestion information table, and update the congestion information of the path to 0.7.
  • a server eg, source server 105
  • a server eg, destination server 106
  • part of the network traffic is forwarded along other paths.
  • the source server 105 sends a message to the destination server 106, and the message is assumed to be transmitted along the following path: Port C of the first leaf switch 101—> Port B of the first leaf switch 101—> the first Port A of the spine switch 103 -> Port C of the second spine switch 104 -> Port B of the second leaf switch 102 -> Port C of the second leaf switch 102.
  • the first leaf switch 101 repeatedly executes the process from steps 302 to 304
  • the spine switch 104 repeatedly executes the process from step 305
  • the second leaf switch 102 repeatedly executes the process from step 306 to step 309
  • the second leaf switch 102 Obtain the congestion status of the other end-to-end path from the device to the first leaf switch 101 .
  • the path head node of the path is Port B of the second leaf switch 102
  • the tail node of the path is Port B of the first leaf switch 101.
  • the second leaf switch 102 can obtain and continuously update the device. Congestion status of the associated end-to-end path to the first leaf switch 101.
  • This embodiment of the present application is flow-based multi-path forwarding, and all leaf switches need to have a multi-path forwarding function based on flowlet flows (ie, small flows).
  • flowlet flows ie, small flows.
  • the first leaf switch 101 When the first leaf switch 101 receives the packet, it calculates the flow identifier of the packet according to a preset rule, and indexes the relevant entry in the flow forwarding table through the aforementioned flow identifier.
  • the preset rule may be: by extracting five The hash value (hash value) calculated by the tuple is used as the stream identification information;
  • the first leaf switch 101 searches the path congestion state table again according to the destination address; the congestion state table stores all the end-to-end paths of the first leaf switch 101 to the second leaf switch 102 (which can be calculated from the destination address of the message). Congestion information; the content of the path congestion status table can refer to the description in Table 1 above;
  • the first leaf switch 101 finds the outgoing port with the least congestion, writes the outgoing port into the flow forwarding table, and sets the flow forwarding table to be valid;
  • the first leaf switch 101 sends the packet from the port
  • the leaf switch directly forwards packets according to the outbound port identified by the flow forwarding table.
  • the flow forwarding table implemented in this application has an aging mechanism. If an entry in the flow forwarding table is not refreshed within T time, the entry is set to be invalid.
  • Leaf switches or spine switches are used to calculate the congestion information in the network, and the congestion information is inserted into the normal original packets to obtain intermediate packets.
  • the second leaf switch receives the intermediate packets. packets, and parse the congestion information from the intermediate packets. Finally, all leaf switches will obtain the congestion information of the links between the node and other leaf switches.
  • the leaf switch calculates the end-to-end least congested path and updates its forwarding entry. During traffic forwarding, the leaf switch implements traffic forwarding according to the path of least congestion based on the flowLet flow, and ultimately improves the throughput of the entire network.
  • FIG. 8 is a new flowchart of the second application scenario based on the method for collecting congestion information and the method for determining an optimal path provided by the embodiment of the present application.
  • FIG. 8 and configure the timer based on the network side port of each switch of the local machine + each leaf switch as the dimension.
  • the optimal path determination method for load balancing shown in FIG. 8 adds at least steps 401 to 403 on the basis of FIG. 4 .
  • a timer may be configured on the leaf switch with each network port of the local switch + other leaf switches as a dimension. When the timer expires, the first leaf switch 101 actively sends a congestion packet from the relevant network side port to the relevant leaf switch.
  • the difference between the optimal path determination method in the second application scenario and the optimal path determination method in the first application scenario is that in the processing of the first leaf switch 101 , whether to actively send a packet is determined by configuring a corresponding timer.
  • Step 401 configure a timer for the corresponding leaf switch.
  • one timer is configured with two parameters of each network port of the local switch and each other leaf switch as dimensions.
  • the first leaf switch 101 four timers need to be configured.
  • the relationship between the timers and the local network side interface and other leaf switches can be referred to as shown in Table 2 below.
  • Timer 1 corresponds to Port A of the first leaf switch 101, and the third leaf switch 107
  • Timer 2 corresponds to Port A of the first leaf switch 101, and the second leaf switch 102
  • Timer 3 corresponds to the Port of the first leaf switch 101 A, the third leaf switch 107
  • Timer 4 corresponds to Port B of the first leaf switch 101, and the second leaf switch 102.
  • Second leaf switch ID information Timer 1 Port A Third leaf switch 107 timer 2 Port A Second leaf switch 102 timer 3 Port B Third leaf switch 107 timer 4 Port B Second leaf switch 102
  • Step 402 Clear the corresponding timer to zero.
  • the leaf switch when the leaf switch sends a packet from a certain network port of itself to other leaf switches, the related timer is cleared and the timing is restarted.
  • the first leaf switch 101 sends a packet from Port A of the device to the second leaf switch 102, then the timer 2 is cleared and the timing is restarted.
  • Step 403 If the timer times out, the leaf switch sends the first data packet to the corresponding other leaf switches.
  • the leaf switch needs to actively send the first data packet to the leaf switch related to the timer through the network side port related to the timer.
  • the destination IP of the first data packet may be the IP address of the relevant leaf switch itself.
  • the first data packet includes at least the congestion information of the inbound direction of the timer-related port; other content of the first data packet can be defined by yourself, but at least includes the congestion information of the inbound direction of the timer-related port.
  • the timer is configured with the content shown in Table 2, wherein, if the timer 2 times out, because the timer 2 and the second leaf switch 102, and its own network side interface Port A is related, so the first leaf switch 101 actively sends the first data packet from Port A of the device to the second leaf switch 102.
  • the destination IP address of the first data packet is the own IP of the second leaf switch 102, and the first data packet needs to include the congestion information of the Port A inbound direction of the first leaf switch 101.
  • FIG. 9 is a flowchart of a third application scenario based on the method for collecting congestion information and the method for determining an optimal path provided by an embodiment of the present application.
  • a spine switch is used as a congestion information calculation node, and bandwidth utilization is used as a path
  • the criteria for judging the congestion situation are illustrated by an example.
  • the optimal path determination method for load balancing shown in FIG. 9 includes at least steps 501 to 510.
  • the source server 105 is configured to send packets to the destination server 106 .
  • All spine switches are congestion information calculation nodes, and all leaf switches need to enable the congestion information calculation function, that is, the first spine switch 103 and the second spine switch 104 are congestion information calculation nodes, the first spine switch 103 and the second spine switch. 104 all need to enable the congestion information calculation function; the leaf switch only acts as a forwarding node, and the congestion information calculation function may not be enabled, that is, the first leaf switch 101, the second leaf switch 102, and the third leaf switch 107 may not enable the congestion information calculation function, Only normal packet forwarding is required.
  • Step 501 The source server sends an original packet.
  • Step 502 The first leaf switch normally forwards the original packet.
  • Step 503 the spine switch obtains congestion-related data.
  • Step 504 The spine switch calculates congestion information according to the congestion-related data.
  • Step 505 the spine switch inserts the congestion information to obtain intermediate packets.
  • Step 506 The second leaf switch determines whether there is congestion information in the intermediate packet. If it is determined that there is congestion information in the intermediate packet, step 307 is performed; otherwise, step 311 is performed; After forwarding the intermediate packets, determine whether there is congestion information in the intermediate packets.
  • Step 507 The second leaf switch parses out the congestion information; specifically, after the second leaf switch parses out the congestion information, it obtains the ID information of the first leaf switch, the path port information of the first leaf switch and related path congestion data, the second leaf switch ID information of the switch, path port information of the second leaf switch, and related path congestion data.
  • Step 508 The second leaf switch calculates the congestion information of one of the end-to-end paths from the second leaf switch to the first leaf switch, and updates it to the local congestion state table.
  • Step 509 Delete the congestion information, obtain the original packet, and forward the original packet to the destination server.
  • Step 510 The destination server receives the original message.
  • Step 511 The second leaf switch normally forwards the intermediate packet.
  • the activation of the congestion information calculation function in the optimal path determination method for load balancing in the third application scenario is similar to the activation of the congestion information calculation function in the optimal path determination method for load balancing in the first application scenario.
  • the spine switch completes the congestion information calculation function.
  • the user can enable congestion through configuration on the spine switch. Information computing function.
  • the congestion information may be network-side port bandwidth utilization information (including outbound direction or inbound direction) or any other information that can identify a port or link congestion state.
  • step 501 the original message sent by the source server 105 to the destination server 106 is assumed to be forwarded along the following path, as shown in Figure 5: Port C of the first leaf switch 101—> Port A of the first leaf switch 101— -> Port A of the first spine switch 103 -> Port C of the first spine switch 103 -> Port A of the second leaf switch 102 -> Port C of the second leaf switch 102.
  • the first leaf switch 101 normally forwards the original packet. As shown in FIG. 5 , the first leaf switch 101 receives the original packet from Port C of the device, forwards it normally, and sends it from Port A of the device.
  • the congestion-related data includes: ingress port information and egress port information of the original packet.
  • the spine switch normally forwards the original packet, and can obtain the ingress port information and egress port information of the original packet.
  • the ingress port refers to the port through which packets enter the spine switch
  • the egress port refers to the port through which packets leave the spine switch.
  • the ingress port of the spine switch is Port A
  • the egress port of the spine switch is Port C.
  • the congestion related data further includes: ID information of the first leaf switch and related path port information.
  • Step 503 includes:
  • the path port of the first leaf switch refers to: one of the ports of the first leaf switch directly connected to the ingress port of the spine switch packet;
  • the spine switch can obtain the information through static configuration or dynamic protocol (such as LLDP protocol); in FIG. 5, Port A of the first spine switch 103 is the ingress port of the packet, and Port A of the first leaf switch 101 is the first port A of the first leaf switch 101.
  • Path port of a leaf switch in this embodiment, the path port of the first leaf switch refers to: one of the ports of the first leaf switch directly connected to the ingress port of the spine switch packet;
  • the spine switch can obtain the information through static configuration or dynamic protocol (such as LLDP protocol); in FIG. 5, Port A of the first spine switch 103 is the ingress port of the packet, and Port A of the first leaf switch 101 is the first port A of the first leaf switch 101.
  • Path port of a leaf switch in FIG. 5, Port A of the first spine switch 103 is the ingress port of the packet, and Port A of the first leaf switch 101 is
  • the congestion-related data also includes: the congestion value of the inbound and outbound directions of the packet.
  • Step 503 also includes:
  • the spine switch can obtain the congestion information of the outbound direction of the port according to the inbound port ID number of the packet and the calculation result combined with the local congestion information.
  • the congestion information in the outbound direction of the port refers to the congestion information of the downlink from Port A of the first spine switch 103 to Port A of the first leaf switch 101.
  • the congestion-related data further includes: ID information of the second leaf switch and related path port information.
  • Step 503 also includes:
  • the path port of the second leaf switch refers to: the port of the second leaf switch 102 directly connected to the outgoing port of the spine switch packet; spine The switch can obtain this information through static configuration or dynamic protocol (such as LLDP protocol); in Figure 5, Port C of the first spine switch 103 is the incoming port of the packet, and Port A of the first leaf switch 101 is the port of the second leaf switch. path port.
  • static configuration or dynamic protocol such as LLDP protocol
  • the congestion-related data also includes: the congestion value of the outbound port ID of the packet in the inbound direction.
  • Step 503 also includes:
  • the spine switch can obtain the congestion information of the inbound direction of the port according to the outbound port ID number of the packet and the calculation result of the local congestion information.
  • Congestion of the uplink to the spine in this embodiment, the information can be the bandwidth utilization value of the port in the inbound direction; in FIG. 5, Port C is the outbound of the packet on the first spine switch 103 port, the congestion information in the inbound direction of the port refers to the congestion information of the uplink from Port A of the second leaf switch 102 to Port A of the first spine switch 103.
  • Step 504 includes:
  • the spine switch calculates the congestion value of one of the paths from the second leaf switch to the first leaf switch according to the congestion-related data. Specifically: the spine switch selects a larger value (indicating the most heavily congested value) as the second leaf switch 102 to The congestion value of one of the paths of the first leaf switch 101 is used as the congestion information inserted into the original message, so as to obtain an intermediate message including the congestion information; in this embodiment, the path is determined by the ID of the second leaf switch. The information and related path ports and ID information of the first leaf switch and related path ports are uniquely determined.
  • the spine switch calculates the congestion value of an end-to-end path from the second leaf switch to the first leaf switch according to the following congestion-related information data: the ID number of the first leaf switch, the path port ID number of the first leaf switch, the first leaf switch Information about the ID number of the second leaf switch and the path port number of the second leaf switch.
  • the calculated end-to-end path is uniquely determined by the ID information and related path ports of the second leaf switch and the ID information and related path ports of the first leaf switch.
  • the first spine switch 103 receives a packet from Port A of the device, and sends a packet from Port C of the device.
  • the first spine switch 103 obtains the ID information of the first leaf switch and the ID number of the path port Port A of the first leaf switch according to the information of the incoming port Port A of the message;
  • the first spine switch 103 obtains the bandwidth utilization in the outbound direction of Port A according to the information of the inbound port Port A of the packet;
  • the first spine switch 103 obtains the ID information of the second leaf switch and the ID number of the path port Port A of the second leaf switch according to the outgoing port Port C information of the packet;
  • the first spine switch 103 obtains the bandwidth utilization in the inbound direction of Port C according to the information of the inbound port Port C of the packet;
  • the first spine switch 103 compares the bandwidth utilization in the outbound direction of Port A with the bandwidth utilization in the inbound direction of Port C, and obtains the congestion value of the path;
  • the first spine switch 103 inserts the relevant congestion value into the original packet as congestion information.
  • the relevant congestion value may include: the ID number of the first leaf switch (for example, the ID number of the first leaf switch 101 in FIG. 5 ); the relevant congestion value may also include: the path port number of the first leaf switch information (for example, referring to the ID number of Port A of the first leaf switch 101 in FIG. 5 ); the relevant congestion value may also include: the ID number of the second leaf switch (for example, referring to the second leaf switch in FIG. 5 ) 102); the relevant congestion value may further include: path port number information of the second leaf switch (for example, in FIG. 5, it refers to the ID number of Port A of the second leaf switch 102).
  • the congestion value of a certain path from the second leaf switch to the first leaf switch is uniquely determined by leaf3 switch Port A and leaf1 switch Port A.
  • Step 506 The second leaf switch determines whether there is congestion information in the intermediate packet. If it is determined that there is congestion information in the intermediate packet, step 307 is performed; otherwise, step 311 is performed; After forwarding the intermediate packets, determine whether there is congestion information in the intermediate packets.
  • Step 507 The second leaf switch parses out the congestion information; specifically, after the second leaf switch parses out the congestion information, it obtains the ID information of the first leaf switch, the path port information of the first leaf switch and related path congestion data, the second leaf switch ID information of the switch, path port information of the second leaf switch, and related path congestion data.
  • Step 508 The second leaf switch calculates the congestion information of one of the end-to-end paths from the second leaf switch to the first leaf switch, and updates it to the local congestion state table.
  • Step 509 Delete the congestion information, obtain the original packet, and forward the original packet to the destination server.
  • Step 510 The destination server receives the original message.
  • Step 511 The second leaf switch normally forwards the intermediate packet.
  • step 506 includes:
  • the second leaf switch 102 receives the intermediate message from Port A of the device
  • the second leaf switch 102 identifies the information of the intermediate packet, and determines whether there is congestion information inserted in the intermediate packet.
  • step 507 is executed. Specifically, step 507 includes:
  • the ID information of the path port to the second switch is obtained according to the congestion information or from the ingress port of the packet.
  • Step 508 includes:
  • the second leaf switch 102 integrates the ID information of the first leaf switch and the related path port number information, the ID information of the second leaf switch and the related path port number information, and finally indexes into one of the related path information in the local congestion state, the local
  • the format of the congestion state table is shown in Table 1 above;
  • the second leaf switch 102 updates the relevant path information.
  • the source server 105 under the first leaf switch 101 sends a packet to the destination server 106 under the second leaf switch 102, some traffic is forwarded along other paths, as shown in FIG. 7 , the first leaf switch Port C of 101 -> Port B of the first leaf switch 101 -> Port A of the second spine switch 104 -> Port C of the first spine switch 103 -> Port B of the second leaf switch 102 - >Switch Port C of the second leaf switch 102.
  • the second leaf switch 102 obtains the congestion status of the end-to-end path from the device to the first leaf switch 101, and the path is from Port B of the uplink second leaf switch 102 to spine2 Port C of the switch and Port A of the downlink spine2 switch to Port B of the first leaf switch 101 are determined.
  • the second leaf switch 102 obtains the congestion status of all end-to-end paths from the device to the first leaf switch 101 .
  • FIG. 10 is a flowchart of the fourth application scenario based on the method for collecting congestion information and the method for determining an optimal path provided by an embodiment of the present application.
  • a spine switch is used as an example for calculating the congestion information, and leaf The switch is configured with timers based on the network side port of each switch of the local machine + each leaf switch as the dimension.
  • FIG. 10 adds at least steps 601 to 603 on the basis of FIG. 9 .
  • the second leaf switch in order to prevent the server node under the first leaf switch from sending packets to the server node under the second leaf switch for a long time, the second leaf switch cannot be The congestion status of the path from the current node to the first leaf switch is updated in time.
  • a timer can be configured on the leaf switch with each network port of the local switch + other leaf switches as the dimension. If the timer expires, the first leaf switch actively sends a congestion packet from the relevant network side port to the relevant leaf switch.
  • Step 601 Configure a timer for the corresponding leaf switch.
  • Step 601 is similar to step 401 .
  • one timer is configured with two parameters of each network port of the local switch and each other leaf switch as dimensions.
  • four timers need to be configured on the first leaf switch 101.
  • the relationship between the timers and the local network side interface and other leaf switches can be referred to as shown in Table 2 above.
  • Timer 1 corresponds to Port A of the first leaf switch 101, and the third leaf switch 107
  • Timer 2 corresponds to Port A of the first leaf switch 101, and the second leaf switch 102
  • Timer 3 corresponds to the Port of the first leaf switch 101 A, the third leaf switch 107
  • Timer 4 corresponds to Port B of the first leaf switch 101, and the second leaf switch 102.
  • Step 602 Clear the corresponding timer to zero.
  • Step 602 is similar to step 402 . Specifically, when the leaf switch sends a packet from a certain network port of itself to other leaf switches, the related timer is cleared and the timing is restarted. For example, in Table 2, if the first leaf switch 101 sends a packet from its own Port A to the second leaf switch 102, the timer 2 is cleared and the timing is restarted.
  • Step 603 If the timer times out, the leaf switch sends a second data packet to the corresponding other leaf switches.
  • Step 603 is similar to step 403, except that what is different from step 403 is that what is sent in step 403 is the first data packet, and what is sent in step 603 is the second data packet. Specifically, if the timer times out, it indicates that the leaf switch has not sent packets from the relevant network side ports to other relevant leaf switches within the time T; at this time, the leaf switch needs to actively send packets to the timer through the network side ports related to the timer. The relevant leaf switch sends the second data packet.
  • the destination IP of the second data packet may be the IP address of the relevant leaf switch itself.
  • the first data packet includes at least the congestion information in the inbound direction of the timer-related port, while the second data packet does not include congestion information; other content of the second data packet can be defined, but does not include congestion information.
  • the timer is configured with the content shown in Table 2, wherein, if the timer 2 times out, because the timer 2 and the second leaf switch 102, and its own network side interface Port A is related, so the first leaf switch 101 actively sends the second data packet from Port A of the device to the second leaf switch 102.
  • the destination IP address of the second data packet is the own IP of the second leaf switch 102, and the second data packet does not include congestion information.
  • FIG. 11 is a new flowchart of the fifth application scenario based on the method for collecting congestion information and the method for determining an optimal path provided by the embodiment of the present application.
  • the congestion information is calculated by the leaf switch and the timer is configured on the leaf switch. , and inserting congestion information into the packet regularly for illustration.
  • the method in FIG. 11 adds at least steps 701 to 703 .
  • the leaf switch calculates the congestion information.
  • the congestion information consumes a certain network bandwidth.
  • a timer is configured with each network side port of the local switch + other leaf switches as the dimension .
  • Step 701 Configure a timer for the corresponding leaf switch.
  • Step 701 is similar to step 401 .
  • one timer is configured with two parameters of each network port of the local switch and each other leaf switch as dimensions.
  • four timers need to be configured on the first leaf switch 101.
  • the relationship between the timers and the local network side interface and other leaf switches can be referred to as shown in Table 2 above.
  • Timer 1 corresponds to Port A of the first leaf switch 101, and the third leaf switch 107
  • Timer 2 corresponds to Port A of the first leaf switch 101, and the second leaf switch 102
  • Timer 3 corresponds to the Port of the first leaf switch 101 A, the third leaf switch 107
  • Timer 4 corresponds to Port B of the first leaf switch 101, and the second leaf switch 102.
  • Step 702 If the timer reaches the preset time, stop timing.
  • Step 703 The first leaf switch determines that the packet needs to be sent from the network side port.
  • step 703 specifically,
  • the first leaf switch finds the corresponding second leaf switch ID according to the destination IP
  • the first leaf switch obtains the corresponding timer information according to the network side port ID and the second leaf switch ID; if the timer times out, then:
  • the first leaf switch needs to insert congestion information into the original packet
  • the timer restarts.
  • all leaf switches will finally obtain the congestion information of the end-to-end related paths from the node to other leaf switches, and calculate the number of The least congested path for other leaf switches.
  • the first leaf switch may select a path with the least congestion for forwarding based on the flowlet flow.
  • only the congestion information calculation node (leaf switch or spine switch) needs to enable the congestion information calculation function, and this function does not need to be deployed on the entire network, which reduces the deployment cost and difficulty; the entire packet transmission process, leaf The switch only needs to insert the relevant congestion information into the original packet.
  • the congestion information is updated faster, and the effect of traffic load balancing is more obvious.
  • the embodiment of the present application also provides a device for collecting congestion information in a spine-and-leaf network, which can implement the above-mentioned method for collecting congestion information in a spine-and-leaf network, and the device includes:
  • a network-side port determination module configured to determine a network-side port
  • a first congestion information acquisition module configured to acquire congestion information related to a network side port
  • the first path port determining module is configured to determine the path port of the first leaf switch according to the configuration policy
  • the first inserting module is configured to insert the congestion information into the original message according to the path port to obtain the intermediate message
  • the first forwarding module is configured to send the intermediate message.
  • the embodiment of the present application also provides another device for collecting congestion information in a spine-and-leaf network, which can implement the above-mentioned method for collecting congestion information in a spine-and-leaf network, and the device includes:
  • the message obtaining module is configured to obtain the original message sent by the first leaf switch from the network side port;
  • the second congestion information acquisition module is configured to acquire congestion information related to the network side port
  • a second path port determining module configured to determine the path port of the first leaf switch
  • the second inserting module is configured to insert the congestion information into the original message according to the path port to obtain the intermediate message
  • the second forwarding module is configured to send the intermediate packet to the second leaf switch.
  • the embodiment of the present application also provides a device for determining an optimal path in another spine-and-leaf network, which can realize the method for determining an optimal path in the above-mentioned spine-and-leaf network, and the device includes:
  • the message receiving module is configured to receive the intermediate message sent by the first leaf switch through the spine switch;
  • a congestion information determination module configured to determine that there is congestion information in the intermediate message
  • a parsing module configured to parse out the congestion information from the intermediate message
  • the calculation module is configured to calculate the minimum congested path according to the congestion information, and determine the minimum congested path as the optimal path.
  • the embodiment of the present application also provides a network switch of a spine-and-leaf network, including:
  • the program is stored in the memory, and the processor executes the at least one program to implement the above-mentioned method for collecting congestion information in a spine-and-leaf network or the method for determining an optimal path in a spine-and-leaf network.
  • the network switch can be a leaf switch or a spine switch.
  • Embodiments of the present application further provide a computer-readable storage medium, where the computer-executable instructions execute the above-mentioned method for collecting congestion information in a spine-and-leaf network or a method for determining an optimal path in a spine-and-leaf network.
  • the memory can be used to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device.
  • the memory may include memory located remotely from the processor, which may be connected to the processor through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the method for collecting congestion information in a spine-and-leaf network, a method for determining an optimal path, a network switch, and a computer-readable storage medium proposed by the embodiments of the present application includes: determining a network side port; acquiring the network Congestion information related to the side port; determine the path port of the first leaf switch according to the configuration policy; insert the congestion information into the original packet according to the path port to obtain an intermediate packet; send the intermediate packet.
  • path congestion information can be collected, and the congestion information can be inserted into the original packet.
  • the second leaf switch can parse out the congestion information and calculate the optimal path based on the congestion information.
  • the leaf switch can Forwarding according to the optimal path to improve the throughput of the entire network.
  • FIGS. 1-11 do not constitute limitations to the embodiments of the present application, and may include more or less steps than those shown in the drawings, or combine certain steps, or different A step of.
  • At least one (item) refers to one or more, and "a plurality” refers to two or more.
  • “And/or” is used to describe the relationship between related objects, indicating that there can be three kinds of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B exist , where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are an “or” relationship.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, c can be single or multiple.
  • the disclosed apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or a part that contributes to some situations in the art, or all or part of the technical solution, and the computer software product is stored in a storage medium , including multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM for short), Random Access Memory (RAM for short), magnetic disk or CD, etc. that can store programs medium.

Abstract

Provided are a method for collecting congestion information in a spine-and-leaf network, a method for determining an optimal path, a network switch, and a computer-readable storage medium. The method for collecting congestion information comprises: determining a network-side port (S201); obtaining congestion information associated with said network side port (S202); determining a path port of a first leaf switch according to a configuration policy (S203); inserting the congestion information into an original packet according to said path port to obtain an intermediate packet (S204); sending out said intermediate packet (S205).

Description

拥塞信息收集方法、确定最优路径方法、网络交换机Congestion information collection method, optimal path determination method, network switch
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请基于申请号为202011083906.6、申请日为2020年10月12日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on the Chinese patent application with the application number of 202011083906.6 and the filing date of October 12, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.
技术领域technical field
本申请涉及通信技术领域,尤其涉及脊叶网络中收集拥塞信息的方法、确定最优路径的方法、网络交换机和计算机可读存储介质。The present application relates to the field of communication technologies, and in particular, to a method for collecting congestion information, a method for determining an optimal path, a network switch, and a computer-readable storage medium in a spine-and-leaf network.
背景技术Background technique
数据中心采用fat-tree组网(胖树拓扑结构组网),服务器之间通讯存在大量相等开销的路径。采用多路径负载均衡技术,能够实现不同路径流量的负载均衡,大幅提升网络的吞吐量和高可用性。常用方案是等价路由(Equal-CostMultipathRouting,ECMP)方案。ECMP是逐跳的基于流的负载均衡策略,当路由器发现同一目的地址出现多个等价路径时,会更新路由表,为此目的地址添加多条规则,对应多个下一跳。在转发流量时可同时利用上述等价路径转发数据。然而,ECMP没有拥塞探测的机制,对于已经产生拥塞的路径来说,很可能加剧路径的拥塞。The data center adopts fat-tree networking (fat tree topology networking), and there are a large number of paths with equal costs for communication between servers. The use of multi-path load balancing technology can achieve load balancing of traffic on different paths, greatly improving network throughput and high availability. A common scheme is an Equal-Cost Multipath Routing (Equal-Cost Multipath Routing, ECMP) scheme. ECMP is a hop-by-hop flow-based load balancing strategy. When a router finds multiple equal-cost paths for the same destination address, it will update the routing table and add multiple rules for this destination address, corresponding to multiple next hops. When forwarding traffic, the above-mentioned equal-cost paths can be used to forward data at the same time. However, ECMP does not have a mechanism for congestion detection, and for a path that has already been congested, it is likely to aggravate the congestion of the path.
随着软件定义技术的兴起,出现了利用SDN(Software Defined Network,软件定义网络)控制器进行拥塞探测的方案。该方案原理是:SDN控制器统一收集所有交换机互联链路的拥塞状态,结合拥塞状态实时计算TOR(Top of Rank)到TOR的优化路径。如果优化路径发生变化,SDN控制器把最新的优化路径对应的转发表信息下发给相关交换机,交换机更新本地转发表项。交换机基于流实现流量按照优化后的路径进行转发。该方案的不足之处在于:SDN控制器需要收集全网路径的拥塞信息,计算量巨大,整个网络更新优化路径的时间过长。With the rise of software-defined technologies, a scheme of using SDN (Software Defined Network, software-defined network) controllers for congestion detection has emerged. The principle of the scheme is that the SDN controller collects the congestion status of all the interconnected links of the switches in a unified manner, and calculates the optimal path from the TOR (Top of Rank) to the TOR in real time in combination with the congestion status. If the optimized path changes, the SDN controller delivers the forwarding table information corresponding to the latest optimized path to the relevant switch, and the switch updates the local forwarding table entry. The switch forwards the traffic according to the optimized path based on the flow. The disadvantage of this scheme is that the SDN controller needs to collect the congestion information of the entire network path, the calculation amount is huge, and the time for the entire network to update the optimized path is too long.
目前在数据中心又提出了利用组播技术进行拥塞探测的方案。该方案中,汇聚交换机定时发起拥塞探测报文,并向core节点交换机进行组播,core节点交换机收到该拥塞探测报文后,一方面添加自身端口的拥塞信息,并把该拥塞探测报文再向其他汇聚节点交换机组播,最终所有TOR交换机获取到所有全网的拥塞信息。TOR交换机实施计算、更新到其他TOR交换机的优化路径,最终基于流实现流量按照优化路径进行转发。该方案中,交换机需要定时发出拥塞探测报文,这部分拥塞探测报文不仅占用了数据中心内的网络带宽,而且增加了转发设备的计算量。At present, a scheme of using multicast technology for congestion detection has been proposed in the data center. In this solution, the aggregation switch periodically sends a congestion detection packet and multicasts it to the core node switch. After the core node switch receives the congestion detection packet, on the one hand, it adds the congestion information of its own port, and sends the congestion detection packet. Then multicast to other aggregation node switches, and finally all TOR switches obtain all network-wide congestion information. The TOR switch implements calculation and updates the optimized paths to other TOR switches, and finally realizes that traffic is forwarded according to the optimized path based on the flow. In this solution, the switch needs to send out congestion detection packets regularly. These congestion detection packets not only occupy the network bandwidth in the data center, but also increase the calculation amount of the forwarding device.
发明内容SUMMARY OF THE INVENTION
本申请实施例的主要目的在于提出一种脊叶网络中收集拥塞信息的方法、确定最优路径的方法、网络交换机和计算机可读存储介质。The main purpose of the embodiments of the present application is to propose a method for collecting congestion information, a method for determining an optimal path, a network switch, and a computer-readable storage medium in a spine-and-leaf network.
本申请实施例的第一方面提出了一种脊叶网络中收集拥塞信息的方法,包括:确定网络侧端口;获取所述网络侧端口相关的拥塞信息;根据配置策略确定第一leaf交换机的路径端口;根据所述路径端口将所述拥塞信息***到原始报文中,得到中间报文;将所述中间报文发出。A first aspect of the embodiments of the present application proposes a method for collecting congestion information in a spine-and-leaf network, including: determining a network-side port; acquiring congestion information related to the network-side port; determining a path of a first leaf switch according to a configuration policy port; insert the congestion information into the original packet according to the path port to obtain an intermediate packet; and send the intermediate packet.
本申请实施例的第二方面提出了一种脊叶网络中确定最优路径的方法,包括:通过spine交换机接收由第一leaf交换机发出的中间报文;确定所述中间报文中存在拥塞信息;从所述中间报文中解析出所述拥塞信息;根据所述拥塞信息计算出最小拥塞路径,将所述最小拥塞路径确定为最优路径。A second aspect of the embodiments of the present application provides a method for determining an optimal path in a spine-and-leaf network, including: receiving an intermediate packet sent by a first leaf switch through a spine switch; determining that congestion information exists in the intermediate packet ; parse out the congestion information from the intermediate message; calculate the minimum congestion path according to the congestion information, and determine the minimum congestion path as the optimal path.
本申请实施例的第三方面提出了一种脊叶网络中收集拥塞信息的方法,包括:从网络 侧端口获取第一leaf交换机发出的原始报文;获取所述网络侧端口相关的拥塞信息;确定所述第一leaf交换机的路径端口;根据所述路径端口将所述拥塞信息***到原始报文中,得到中间报文;将所述中间报文发送给第二leaf交换机。A third aspect of the embodiments of the present application provides a method for collecting congestion information in a spine-and-leaf network, including: acquiring an original packet sent by a first leaf switch from a network-side port; acquiring congestion information related to the network-side port; determining the path port of the first leaf switch; inserting the congestion information into the original packet according to the path port to obtain an intermediate packet; and sending the intermediate packet to the second leaf switch.
本申请实施例的第四方面提出了一种脊叶网络的网络交换机,包括:至少一个存储器;至少一个处理器;至少一个程序;所述程序被存储在存储器中,处理器执行所述至少一个程序以实现:如上述第一方面所述的方法;或者,如上述第二方面所述的方法。A fourth aspect of the embodiments of the present application provides a network switch for a spine-and-leaf network, including: at least one memory; at least one processor; at least one program; the program is stored in the memory, and the processor executes the at least one A program to implement: the method described in the above first aspect; or, the method described in the above second aspect.
本申请实施例的第五方面提出了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行:如上述第一方面所述的方法;或者,如上述第二方面所述的方法;或者,如上述第三方面所述的方法。A fifth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to cause a computer to execute: as described in the first aspect above The method described above; or the method described in the second aspect above; or the method described in the third aspect above.
附图说明Description of drawings
图1是本申请实施例提供的用于负载均衡的最优路径确定方法的一应用场景示意图。FIG. 1 is a schematic diagram of an application scenario of the method for determining an optimal path for load balancing provided by an embodiment of the present application.
图2是本申请实施例提供的用于负载均衡的最优路径确定方法的另一应用场景示意图。FIG. 2 is a schematic diagram of another application scenario of the method for determining an optimal path for load balancing provided by an embodiment of the present application.
图3是本申请实施例提供的用于负载均衡的最优路径确定方法的又一应用场景示意图FIG. 3 is a schematic diagram of another application scenario of the optimal path determination method for load balancing provided by an embodiment of the present application
图4是本申请实施例提供的用于负载均衡的最优路径确定方法的流程图。FIG. 4 is a flowchart of an optimal path determination method for load balancing provided by an embodiment of the present application.
图5是本申请第一实施例提供的用于负载均衡的最优路径确定方法的一应用场景示意图。FIG. 5 is a schematic diagram of an application scenario of the method for determining an optimal path for load balancing provided by the first embodiment of the present application.
图6是本申请第二实施例提供的用于负载均衡的最优路径确定方法的流程图。FIG. 6 is a flowchart of a method for determining an optimal path for load balancing provided by the second embodiment of the present application.
图7是本申请第二实施例提供的用于负载均衡的最优路径确定方法的另一应用场景示意图。FIG. 7 is a schematic diagram of another application scenario of the method for determining an optimal path for load balancing provided by the second embodiment of the present application.
图8是本申请第三实施例提供的用于负载均衡的最优路径确定方法的第二应用场景示意图的部分流程图。FIG. 8 is a partial flowchart of a schematic diagram of a second application scenario of the method for determining an optimal path for load balancing provided by the third embodiment of the present application.
图9是本申请第四实施例提供的用于负载均衡的最优路径确定方法的第三应用场景示意图的流程图。FIG. 9 is a flowchart of a schematic diagram of a third application scenario of the method for determining an optimal path for load balancing provided by the fourth embodiment of the present application.
图10是本申请第五实施例提供的用于负载均衡的最优路径确定方法的第四应用场景示意图的流程图。FIG. 10 is a flowchart of a schematic diagram of a fourth application scenario of the method for determining an optimal path for load balancing provided by the fifth embodiment of the present application.
图11是本申请第五实施例提供的用于负载均衡的最优路径确定方法的的第五应用场景示意图的流程图。FIG. 11 is a flowchart of a schematic diagram of a fifth application scenario of the method for determining an optimal path for load balancing provided by the fifth embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, the modules may be divided differently from the device, or executed in the order in the flowchart. steps shown or described. The terms "first", "second" and the like in the description and claims and the above drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.
首先,对本申请中涉及的若干名词进行解析:First, some terms involved in this application are analyzed:
等价路由(Equal-CostMultipathRouting,ECMP):是逐跳的基于流的负载均衡策略,当路由器发现同一目的地址出现多个等价路径时,会更新路由表,为此目的地址添加多条规则,对应多个下一跳;在转发流量时可同时利用等价路径转发数据。Equal-Cost Multipath Routing (ECMP): It is a hop-by-hop flow-based load balancing strategy. When the router finds multiple equal-cost paths for the same destination address, it will update the routing table and add multiple rules for this destination address. Corresponds to multiple next hops; when forwarding traffic, you can use equal-cost paths to forward data at the same time.
fat-tree组网:又叫胖树拓扑结构组网。Fat-tree networking: Also known as fat-tree topology networking.
spine交换机:又叫脊交换机,本申请中是指接入leaf交换机。Spine switch: Also called spine switch, in this application, it refers to the access leaf switch.
leaf交换机:又叫叶交换机,本申请中是指接入服务器的交换机。Leaf switch: Also called leaf switch, in this application, it refers to the switch that accesses the server.
spine-leaf网络结构:又叫脊叶网络结构或者脊叶拓扑网络结构,是包括leaf交换机(连接设备或服务器)和spine节点(连接交换机)的网络结构,是数据中心的网络拓扑的重要部分。Spine-leaf network structure: also known as spine-leaf network structure or spine-leaf topology network structure, it is a network structure including leaf switches (connecting devices or servers) and spine nodes (connecting switches), and is an important part of the network topology of the data center.
上行链路:本申请实施例是指从leaf交换机到spine交换机的一条链路。Uplink: The embodiment of this application refers to a link from a leaf switch to a spine switch.
下行链路:本申请实施例是指从spine交换机到leaf交换机的一条链路。Downlink: The embodiment of this application refers to a link from a spine switch to a leaf switch.
链路层发现协议(Link Layer Discovery Protocol,LLDP):是一种数据链路层协议。Link Layer Discovery Protocol (LLDP): It is a data link layer protocol.
本申请实施例的应用场景,是数据中心网络。在数据中心网络,多路径负载均衡技术是一种实现高吞吐量、低延迟和高可用的技术。而用于实现多路径负载均衡技术的常用方案是ECMP方案。ECMP是逐跳的基于流的负载均衡策略,当路由器发现同一目的地址出现多个等价路径时,会更新路由表,为此目的地址添加多条规则,对应多个下一跳。然而,ECMP没有拥塞探测的机制,对于已经产生拥塞的路径来说,很可能加剧路径的拥塞。基于此,本申请实施例提出一种应用于两层spine-leaf网络结构的用于脊叶网络中收集拥塞信息的方法、确定最优路径的方法、网络交换机和计算机可读存储介质,用于实现流量按照优化路径转发、以提高整个网络的吞吐率。具体通过如下实施例进行说明,首先描述本申请实施例中的用于负载均衡的最优路径确定方法。The application scenario of the embodiment of the present application is a data center network. In data center networks, multi-path load balancing technology is a technology that achieves high throughput, low latency, and high availability. A common solution for implementing the multi-path load balancing technology is the ECMP solution. ECMP is a hop-by-hop flow-based load balancing strategy. When a router finds multiple equal-cost paths for the same destination address, it will update the routing table and add multiple rules for this destination address, corresponding to multiple next hops. However, ECMP does not have a mechanism for congestion detection, and for a path that has already been congested, it is likely to aggravate the congestion of the path. Based on this, the embodiments of the present application propose a method for collecting congestion information in a spine-leaf network, a method for determining an optimal path, a network switch, and a computer-readable storage medium, which are applied to a two-layer spine-leaf network structure. The traffic is forwarded according to the optimized path to improve the throughput of the entire network. Specifically, the following embodiments are used for description. First, the optimal path determination method for load balancing in the embodiments of the present application is described.
图1是本申请实施例的应用场景示意图,本申请实施例的脊叶网络中收集拥塞信息的方法和最优路径确定方法应用于脊叶通信网络(spine-leaf网络)中,spine-leaf网络至少包括:第一leaf交换机101、第二leaf交换机102、spine交换机。图1所示意的实施例中,spine-leaf网络是双层网络,包括脊层和叶层,脊层包括了spine交换机,叶层包括了leaf交换机,每个leaf交换机连接到相应的一个spine交换机,同一个leaf交换机可以分别连接到多个不同的spine交换机,同一个spine交换机也可以分别连接多个不同的leaf交换机;其中,第一leaf交换机101为源leaf交换机,被设置成接入源服务器105;第二leaf交换机102为目的leaf交换机,被设置成接入目的服务器106。源服务器105被设置成发送报文,目的服务器106被设置成接收报文;源服务器105发送的报文经由第一leaf交换机101、spine交换机、第二leaf交换机102,发送给目的服务器106,目的服务器106被设置成接收由第二leaf交换机102发送的报文。图1示意的一实施例中,spine交换机包括第一spine交换机103和第二spine交换机104。第一leaf交换机101被设置成连接第一spine交换机103,第二leaf交换机102被设置成连接第二spine交换机104。网络***还可以包括第三leaf交换机107。FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present application. The method for collecting congestion information and the optimal path determination method in a spine-and-leaf network according to the embodiment of the present application are applied to a spine-leaf communication network (spine-leaf network), and the spine-leaf network It includes at least: a first leaf switch 101, a second leaf switch 102, and a spine switch. In the embodiment shown in Figure 1, the spine-leaf network is a two-layer network, including a spine layer and a leaf layer, the spine layer includes spine switches, the leaf layer includes leaf switches, and each leaf switch is connected to a corresponding spine switch , the same leaf switch can be connected to multiple different spine switches, and the same spine switch can also be connected to multiple different leaf switches; wherein, the first leaf switch 101 is the source leaf switch and is set to access the source server 105 ; the second leaf switch 102 is a destination leaf switch, and is set to access the destination server 106 . The source server 105 is set to send packets, and the destination server 106 is set to receive packets; the packets sent by the source server 105 are sent to the destination server 106 via the first leaf switch 101, the spine switch, and the second leaf switch 102, and the destination The server 106 is arranged to receive messages sent by the second leaf switch 102 . In an embodiment illustrated in FIG. 1 , the spine switch includes a first spine switch 103 and a second spine switch 104 . The first leaf switch 101 is configured to connect to the first spine switch 103 , and the second leaf switch 102 is configured to connect to the second spine switch 104 . The network system may also include a third leaf switch 107 .
leaf交换机为终端设备,该终端设备可以包括台式计算机或服务器。每一交换机(leaf交换机或者spine交换机)可以包括路由器等。A leaf switch is a terminal device, which can include a desktop computer or a server. Each switch (leaf switch or spine switch) may include a router or the like.
本申请实施例中,leaf交换机的端口中,与服务器(源服务器105或者目的服务器106)连接的端口是用户侧端口,与spine交换机连接的端口是网络侧接口,例如,第一leaf交换机101的端口(Port)C和第二leaf交换机102的Port C是用户侧端口;第一leaf交换机101的Port A、Port B和第二leaf交换机102的Port A和Port B是网络侧端口。spine交换机103上的所有端口均为网络侧端口,例如第一spine交换机103的Port A、Port B、Port C均为网络侧端口,第二spine交换机103的Port A、Port B、Port C均为网络侧端口。In this embodiment of the present application, among the ports of the leaf switch, the port connected to the server (source server 105 or the destination server 106 ) is the user-side port, and the port connected to the spine switch is the network-side interface. For example, the port of the first leaf switch 101 Port C and Port C of the second leaf switch 102 are user side ports; Port A and Port B of the first leaf switch 101 and Port A and Port B of the second leaf switch 102 are network side ports. All ports on the spine switch 103 are network-side ports. For example, Port A, Port B, and Port C of the first spine switch 103 are network-side ports, and Port A, Port B, and Port C of the second spine switch 103 are all network-side ports. network side port.
在本申请实施例中,两个交换机之间的直接连接可以称为链路,报文从源leaf交换机到目的leaf交换机所遍历的所有链路的总和为路径,其中,所有链路包括源leaf交换机和目的leaf交换机之间的任何中间交换机(例如spine交换)。本申请实施例中,上行链路是指从leaf交换机到spine交换机的一条链路,例如,从第二leaf交换机102的Port A到第二spine交换机104的Port C的链路就是一条上行链路。下行链路是指spine交换机到leaf交换机的一条链路,例如,从第一spine交换机103的Port A到第一leaf交换机101的Port A的链路就是一条下行链路。In this embodiment of the present application, a direct connection between two switches may be called a link, and the sum of all links traversed by a packet from the source leaf switch to the destination leaf switch is a path, where all links include the source leaf switch Any intermediate switches (such as spine switches) between the switch and the destination leaf switch. In this embodiment of the present application, the uplink refers to a link from a leaf switch to a spine switch. For example, the link from Port A of the second leaf switch 102 to Port C of the second spine switch 104 is an uplink . The downlink refers to a link from a spine switch to a leaf switch. For example, the link from Port A of the first spine switch 103 to Port A of the first leaf switch 101 is a downlink.
本申请实施例中,leaf交换机之间端到端的路径是指:从其中一个leaf交换机的某个网络侧端口(路径首节点)到另外一台leaf交换机的某个网络侧端口(路径尾节点)的路径。在图1所示的spine-leaf双层网络中,该路径由一条上行链路和一条下行链路唯一确定,并可以用路径首节点和路径尾节点来标识。例如,从第二leaf交换机102的Port A到第一leaf交换机101的Port A的路径表征了一条从第二leaf交换机102到第一leaf交换机101的端到端路径,该路径由一条上行链路(第二leaf交换机102的Port A到第一spine交换机103的Port C)和一条下行链路(第一spine交换机103的Port A到第一leaf交换机101的Port A)唯一确定,并可以用路径首节点(第二leaf交换机102的Port A)和路径尾节点(第一leaf交换机101的Port A)唯一标识。In this embodiment of the present application, the end-to-end path between leaf switches refers to: from a certain network-side port (path head node) of one leaf switch to a certain network-side port (path tail node) of another leaf switch path of. In the spine-leaf two-layer network shown in Figure 1, the path is uniquely determined by an uplink and a downlink, and can be identified by the path head node and the path tail node. For example, the path from Port A of the second leaf switch 102 to Port A of the first leaf switch 101 represents an end-to-end path from the second leaf switch 102 to the first leaf switch 101, the path consisting of an uplink (Port A of the second leaf switch 102 to Port C of the first spine switch 103) and a downlink (Port A of the first spine switch 103 to Port A of the first leaf switch 101) are uniquely determined, and the path can be used The head node (Port A of the second leaf switch 102) and the trailing node (Port A of the first leaf switch 101) are uniquely identified.
一对leaf交换机(例如图1所示的第一leaf交换机101和第二leaf交换机102)之间的报文流经所述经由的路径是从源leaf交换机到spine交换机、再到目的leaf交换机。交换机接收报文的端口也称为入端口,可以用于识别发送该报文的交换机,例如图1所示,在端口的任何报文都可以确定从第一leaf交换机101发出。应当理解的是,端口可以是双工端口,也就是说,单个端口可以用于接收报文(即,用作入端口)和用于发送报文(即,用作出端口)。The packet flow between a pair of leaf switches (for example, the first leaf switch 101 and the second leaf switch 102 shown in FIG. 1 ) passes through the path from the source leaf switch to the spine switch, and then to the destination leaf switch. The port on which the switch receives the packet is also called the ingress port, which can be used to identify the switch that sends the packet. For example, as shown in FIG. 1 , any packet on the port can be determined to be sent from the first leaf switch 101 . It should be understood that a port may be a duplex port, that is, a single port may be used for receiving messages (ie, serving as an ingress port) and for sending messages (ie, serving as an outgoing port).
当报文从同一交换机内的入端口传输到出端口时,可能会出现拥塞(例如图1所示的,报文从第一leaf交换机101的Port C传输到Port A)。拥塞可能取决于交换机的端口利用率、端口的传输速率、端口处的队列拥塞和/或处理器和存储器资源等。When a packet is transmitted from an ingress port to an egress port in the same switch, congestion may occur (for example, as shown in FIG. 1 , the packet is transmitted from Port C to Port A of the first leaf switch 101). Congestion may depend on the port utilization of the switch, the transfer rate of the port, queue congestion at the port, and/or processor and memory resources, among others.
在一实施例中,每个leaf交换机到其中一个spine交换机之间可以通过一条链路连接;在另一实施例中,每个leaf交换机到其中一个spine交换机之间可以通过多条链路连接。图1所示意的是每个leaf交换机到其中一个spine交换机之间通过一条链路连接。In one embodiment, each leaf switch can be connected to one of the spine switches through a link; in another embodiment, each leaf switch can be connected to one of the spine switches through multiple links. Figure 1 shows a link between each leaf switch and one of the spine switches.
图2是本申请实施例的另一应用场景示意图,与图1不同的是,图2中的每个leaf交换机到其中一个spine交换机之间通过多条链路连接。例如,从第一leaf交换机101的Port A到第一spine交换机103的Port A之间是一条链路连接,从第一leaf交换机101的Port B到第一spine交换机103的Port B之间是另一条链路连接;从第一leaf交换机102的Port E到第二spine交换机104的Port A之间是一条链路连接,从第一leaf交换机102的Port F到第二spine交换机104的Port A之间是另一条链路连接;从第二leaf交换机102的Port A到第一spine交换机103的Port C之间是一条链路连接,从第二leaf交换机102的Port B到第一spine交换机103的Port D之间是另一条链路连接;从第二leaf交换机102的Port E到第二spine交换机104的Port C之间是一条链路连接,从第二leaf交换机102的Port F到第二spine交换机104的Port D之间是另一条链路连接。FIG. 2 is a schematic diagram of another application scenario of the embodiment of the present application. Different from FIG. 1 , each leaf switch in FIG. 2 is connected to one of the spine switches through multiple links. For example, from Port A of the first leaf switch 101 to Port A of the first spine switch 103 is a link connection, and from Port B of the first leaf switch 101 to Port B of the first spine switch 103 is another link A link connection; from Port E of the first leaf switch 102 to Port A of the second spine switch 104 is a link connection, from Port F of the first leaf switch 102 to Port A of the second spine switch 104. is another link connection; from Port A of the second leaf switch 102 to Port C of the first spine switch 103 is a link connection, from Port B of the second leaf switch 102 to the port of the first spine switch 103 There is another link connection between Port D; there is a link connection from Port E of the second leaf switch 102 to Port C of the second spine switch 104, and from Port F of the second leaf switch 102 to the second spine Another link is connected between Port D of the switch 104.
相比于图1,图3是本申请实施例的又一应用场景示意图,与图1不同的是,图3所示意的用于负载均衡的最优路径确定方法是应用于单台网络设备的场景,单台网络设备内部包括多个芯片,用于实现图1所示的spine-leaf网络结构的功能,例如,第一leaf交换机作为第一leaf模块(例如是芯片1),第二leaf交换机作为第二leaf模块(例如是芯片2),第三leaf交换机作为第三leaf模块(例如是芯片3),第一spine交换机作为第一spine模块(例如是芯片14),第二spine交换机作为第二spine模块(例如是芯片5),源服务器可以与第一面板端口通信连接,目的服务器可以与第二面板端口通信连接。Compared with FIG. 1 , FIG. 3 is a schematic diagram of another application scenario of the embodiment of the present application. Different from FIG. 1 , the optimal path determination method for load balancing shown in FIG. 3 is applied to a single network device. In a scenario, a single network device includes multiple chips to implement the functions of the spine-leaf network structure shown in Figure 1. For example, the first leaf switch is used as the first leaf module (for example, chip 1), and the second leaf switch is used as the first leaf module. As the second leaf module (eg chip 2), the third leaf switch as the third leaf module (eg chip 3), the first spine switch as the first spine module (eg chip 14), and the second spine switch as the first spine With two spine modules (for example, chip 5), the source server can be communicatively connected to the first panel port, and the destination server can be communicatively connected to the second panel port.
图4是本申请实施例提供的脊叶网络中收集拥塞信息的方法的一个流程图,图4中的脊叶网络中收集拥塞信息的方法至少包括步骤201至步骤205。图4所示的脊叶网络中收集拥塞信息的方法应用于第一leaf交换机执行。FIG. 4 is a flowchart of a method for collecting congestion information in a spine-and-leaf network provided by an embodiment of the present application. The method for collecting congestion information in a spine-and-leaf network in FIG. 4 at least includes steps 201 to 205 . The method for collecting congestion information in the spine-and-leaf network shown in FIG. 4 is applied to the execution of the first leaf switch.
步骤201、确定网络侧端口; Step 201, determine the network side port;
步骤202、获取网络侧端口相关的拥塞信息; Step 202, obtaining congestion information related to the network side port;
步骤203、根据配置策略确定第一leaf交换机的路径端口;Step 203: Determine the path port of the first leaf switch according to the configuration policy;
步骤204、根据路径端口将拥塞信息***到原始报文中,得到中间报文。Step 204: Insert the congestion information into the original packet according to the path port to obtain an intermediate packet.
步骤205、将中间报文发出。Step 205: Send the intermediate message.
在一些公开实施例的步骤201中,本申请实施例中,leaf交换机的端口中,与服务器(源服务器105或者目的服务器106)连接的端口是用户侧端口,与spine交换机连接的端口是网络侧接口。In step 201 of some disclosed embodiments, in the embodiments of the present application, among the ports of the leaf switch, the port connected to the server (source server 105 or the destination server 106 ) is the user side port, and the port connected to the spine switch is the network side port interface.
步骤202中,拥塞信息包括路径端口信息、路径拥塞数据;通过拥塞信息计算节点获取网络侧端口相关的拥塞信息。拥塞信息计算节点可以用于计算本地的网络侧端口的拥塞信息;拥塞信息计算节点还可以用于:当原始报文通过拥塞信息计算节点时,该拥塞信息计算节点按照用户配置的策略,把本地计算的拥塞信息***到原始报文中。In step 202, the congestion information includes path port information and path congestion data; and the congestion information related to the port on the network side is obtained through the congestion information calculation node. The congestion information calculation node can be used to calculate the congestion information of the local network side port; the congestion information calculation node can also be used to: when the original packet passes through the congestion information calculation node, the congestion information calculation node according to the policy configured by the user, the local The calculated congestion information is inserted into the original message.
在一些实施例中,拥塞信息计算节点可以为leaf交换机;在其他的实施例中,拥塞信息计算节点可以为spine交换机。具体地,若拥塞信息计算节点为leaf交换机,则叶层中的所有leaf交换机均为拥塞信息计算节点,例如,图1中,第一leaf交换机101、第二leaf交换机102、第三leaf交换机107均为拥塞信息计算节点;且脊层中的所有spine交换机为正常的转发节点。若拥塞信息计算节点为spine交换机,则脊层中的所有spine交换机均为拥塞信息计算节点,例如,图1中,第一spine交换机103、第二spine交换机104均为拥塞信息计算节点;且叶层中的所有leaf交换机为正常的转发节点。报文可以从转发节点转发出去。In some embodiments, the congestion information calculation node may be a leaf switch; in other embodiments, the congestion information calculation node may be a spine switch. Specifically, if the congestion information calculation node is a leaf switch, all leaf switches in the leaf layer are congestion information calculation nodes. For example, in FIG. 1 , the first leaf switch 101 , the second leaf switch 102 , and the third leaf switch 107 All are congestion information computing nodes; and all spine switches in the spine layer are normal forwarding nodes. If the congestion information calculation node is a spine switch, all spine switches in the spine layer are congestion information calculation nodes. For example, in FIG. 1, the first spine switch 103 and the second spine switch 104 are congestion information calculation nodes; All leaf switches in the layer are normal forwarding nodes. Packets can be forwarded from forwarding nodes.
在一些实施例中,拥塞信息,可以是报文的出端口的拥塞信息,也可以是按照一定策略选出的某个本设备网络侧端口的拥塞信息,还可以是所有本设备的网络侧端口入方向的拥塞信息,本申请实施例对拥塞信息不做限定。In some embodiments, the congestion information may be the congestion information of the outgoing port of the packet, the congestion information of a network-side port of the device selected according to a certain policy, or the network-side ports of all the device. The congestion information in the inbound direction is not limited in this embodiment of the present application.
拥塞数据可以包括网络侧端口带宽利用率,时间戳信息或其他任何可以标识端口或链路拥塞状态的信息等。在一实施例中,这些拥塞数据可以单独使用;在另一实施例中,拥塞数据也可以组合使用。Congestion data can include network-side port bandwidth utilization, timestamp information, or any other information that can identify port or link congestion status. In one embodiment, these congestion data may be used alone; in another embodiment, the congestion data may also be used in combination.
步骤202中,拥塞信息计算节点获取本设备的所有网络侧端口相关的拥塞信息。在一些实施例中,结合图1,若拥塞信息计算节点为leaf交换机,则第一leaf交换机101执行上述的步骤201至步骤205;spine交换机作为正常的转发节点,进行正常的报文转发。在其他的实施例中,结合图1所示,若拥塞信息计算节点为spine交换机,则第一spine交换机103执行另一实施例的脊叶网络中收集拥塞信息的方法,后文将详细进行说明,第二leaf交换机102从第一spine交换机103收到中间报文后执行后文的脊叶网络中确定最优路径的方法;第一leaf交换机101作为正常的转发节点,进行正常的报文转发。In step 202, the congestion information calculation node acquires congestion information related to all network side ports of the device. In some embodiments, referring to FIG. 1 , if the congestion information calculation node is a leaf switch, the first leaf switch 101 executes the above steps 201 to 205 ; the spine switch acts as a normal forwarding node and performs normal packet forwarding. In other embodiments, as shown in FIG. 1 , if the congestion information calculation node is a spine switch, the first spine switch 103 executes the method for collecting congestion information in a spine-and-leaf network of another embodiment, which will be described in detail later. , the second leaf switch 102 performs the following method of determining the optimal path in the spine-leaf network after receiving the intermediate message from the first spine switch 103; the first leaf switch 101 acts as a normal forwarding node and performs normal message forwarding .
本申请实施例中,原始报文是指由源服务器向目的服务器发送的报文。由源服务器发出的原始报文需流经第一leaf交换机、spine交换机、第二leaf交换机,通过第二leaf交换机转发至目的服务器。In this embodiment of the present application, the original message refers to a message sent by the source server to the destination server. The original packet sent by the source server needs to flow through the first leaf switch, spine switch, and second leaf switch, and then forwarded to the destination server through the second leaf switch.
应该理解的是,拥塞信息可以***到原始报文中的任何位置,本申请实施例对拥塞信息***原始报文中的位置不做限制。It should be understood that the congestion information may be inserted into any position in the original packet, and the embodiment of the present application does not limit the position at which the congestion information is inserted into the original packet.
在一些实施例的步骤203包括:Step 203 in some embodiments includes:
将原始报文的出端口或者将第一leaf交换机的网络侧端口确定为第一leaf交换机的路径端口;Determine the outgoing port of the original packet or the network side port of the first leaf switch as the path port of the first leaf switch;
或者,or,
对第一leaf交换机的所有网络侧端口进行轮询方式选择其中一个网络侧端口作为第一leaf交换机的路径端口。Perform a polling method on all network side ports of the first leaf switch to select one of the network side ports as the path port of the first leaf switch.
在一些实施例中,脊叶网络中收集拥塞信息的方法,还包括:In some embodiments, the method for collecting congestion information in a spine-and-leaf network further includes:
为相应的leaf交换机配置定时器;Configure the timer for the corresponding leaf switch;
若从网络侧端口向第二leaf交换机发送报文,则将定时器清零。If a packet is sent from the network side port to the second leaf switch, the timer is cleared.
进一步地,脊叶网络中收集拥塞信息的方法,还包括:Further, the method for collecting congestion information in a spine-and-leaf network further includes:
若定时器超时,则从对应定时器的网络侧端口向对应定时器的第二leaf交换机发送第一数据报文;其中,第一数据报文包括拥塞信息。If the timer times out, a first data packet is sent from the network side port corresponding to the timer to the second leaf switch corresponding to the timer, where the first data packet includes congestion information.
在另一些实施例中,脊叶网络中收集拥塞信息的方法,还包括:In other embodiments, the method for collecting congestion information in a spine-and-leaf network further includes:
为相应的leaf交换机配置定时器;Configure the timer for the corresponding leaf switch;
确定原始报文从网络侧端口发出;Determine that the original packet is sent from the network side port;
确定第二leaf交换机的ID信息;Determine the ID information of the second leaf switch;
若定时器超时,则根据网络侧端口的ID信息和第二leaf交换机的ID信息将拥塞信息***到原始报文中,得到中间报文。If the timer times out, the congestion information is inserted into the original packet according to the ID information of the port on the network side and the ID information of the second leaf switch to obtain an intermediate packet.
以拥塞信息计算节点为leaf交换机,本申请实施例还提供一种脊叶网络中确定最优路径的方法,确定最优路径的方法由第二leaf交换机执行,确定最优路径的方法包括:Taking the congestion information computing node as a leaf switch, the embodiment of the present application also provides a method for determining an optimal path in a spine-and-leaf network, the method for determining the optimal path is performed by a second leaf switch, and the method for determining the optimal path includes:
通过spine交换机接收由第一leaf交换机发出的中间报文;Receive the intermediate packet sent by the first leaf switch through the spine switch;
确定中间报文中存在拥塞信息;Determine that there is congestion information in the intermediate message;
从中间报文中解析出拥塞信息;Parse congestion information from intermediate packets;
根据拥塞信息计算出最小拥塞路径,将最小拥塞路径确定为最优路径。The minimum congestion path is calculated according to the congestion information, and the minimum congestion path is determined as the optimal path.
在该实施例中,第二leaf交换机收到中间报文后,从中间报文解析出拥塞信息,并处理解析出的拥塞信息,得到原始报文、路径端口信息、路径拥塞数据。In this embodiment, after receiving the intermediate packet, the second leaf switch parses the congestion information from the intermediate packet, and processes the parsed congestion information to obtain the original packet, path port information, and path congestion data.
具体地,若拥塞信息计算节点为leaf交换机,第二leaf交换机解析中间报文中***的拥塞信息,获得某条下行链路的拥塞情况;第二leaf交换机根据本交换机计算的拥塞信息,获得某条上行链路的拥塞情况;第二leaf交换机根据上行链路的拥塞情况和下行链路的拥塞情况,更新从第二leaf交换机到第一leaf交换机端到端的相应路径的拥塞状态;若拥塞信息计算节点为spine交换机,第二leaf交换机解析原始报文中***的拥塞信息和相关报文信息,可以获得有关上行链路与下行链路的拥塞信息;第二leaf交换机根据上述拥塞信息,更新从第二leaf交换机到第一leaf交换机端到端的相应关路径的拥塞状态。Specifically, if the congestion information calculation node is a leaf switch, the second leaf switch parses the congestion information inserted in the intermediate packet to obtain the congestion information of a certain downlink; the second leaf switch obtains a certain downlink according to the congestion information calculated by this switch. The congestion situation of each uplink; the second leaf switch updates the congestion state of the corresponding end-to-end path from the second leaf switch to the first leaf switch according to the congestion situation of the uplink and the downlink; if the congestion information The computing node is a spine switch, and the second leaf switch parses the congestion information inserted in the original packet and related packet information, and obtains congestion information about the uplink and downlink; the second leaf switch updates the The congestion state of the corresponding off-path from the second leaf switch to the first leaf switch end-to-end.
上述的拥塞信息最终保存到本地拥塞状态表中。示例地,结合图1,第二leaf交换机更新的拥塞状态表可以参照表1所示。表1的列坐标表示网络中的所有leaf交换机(例如第一leaf交换机101、第二leaf交换机102、第三leaf交换机107),表1的横坐标表示本leaf交换机(例如第一leaf交换机101)到其他leaf交换机的不同路径,该路径可以用路径首节点(例如第一leaf交换机101的Port A)和路径尾节点(例如第二leaf交换机102的Port A)唯一标识。The above-mentioned congestion information is finally saved in the local congestion state table. For example, with reference to FIG. 1 , the congestion state table updated by the second leaf switch can be referred to as shown in Table 1. The column coordinates of Table 1 represent all leaf switches in the network (for example, the first leaf switch 101, the second leaf switch 102, and the third leaf switch 107), and the abscissa of Table 1 represents the current leaf switch (for example, the first leaf switch 101) For different paths to other leaf switches, the path can be uniquely identified by a path head node (eg, Port A of the first leaf switch 101 ) and a trailing node (eg, Port A of the second leaf switch 102 ).
Figure PCTCN2021113568-appb-000001
Figure PCTCN2021113568-appb-000001
表1Table 1
路径的拥塞状态可以用路径首节点出方向的端口利用率和路径尾节点入方向的端口利用率来表示。在本申请实施例中,选择两个端口利用率的最大值作为整条路径的拥塞信息。例如在图5中,第二leaf交换机102到第一leaf交换机101的一条路径,该路径的路径首节点是第二leaf交换机102的Port A,该路径的路径尾节点是第一leaf交换机 101的Port A。假定第二leaf交换机102的Port A(路径首节点出方向)的带宽利用率是0.3,第一leaf交换机101的Port A(路径尾节点入方向)的带宽利用率是0.7,则该路径的总体带宽利用率是0.7,即取0.3和0.7中的最大值0.7。The congestion state of a path can be represented by the port utilization in the outbound direction of the first node of the path and the port utilization in the inbound direction of the trailing node. In this embodiment of the present application, the maximum value of the utilization ratio of the two ports is selected as the congestion information of the entire path. For example, in FIG. 5 , for a path from the second leaf switch 102 to the first leaf switch 101 , the first node of the path is Port A of the second leaf switch 102 , and the tail node of the path is the first node of the leaf switch 101 . Port A. Assuming that the bandwidth utilization rate of Port A (the outgoing direction of the path head node) of the second leaf switch 102 is 0.3, and the bandwidth utilization rate of Port A (the inbound direction of the trailing node) of the first leaf switch 101 is 0.7, then the overall The bandwidth utilization rate is 0.7, that is, the maximum value of 0.3 and 0.7 is 0.7.
通过上述实施例所确定的最优路径,当第一leaf交换机向第二leaf交换机发送报文时,第一leaf交换机可以基于flowlet流选择最优路径进行转发。Through the optimal path determined in the foregoing embodiment, when the first leaf switch sends the packet to the second leaf switch, the first leaf switch may select the optimal path for forwarding based on the flowlet flow.
在执行上述步骤之后,最终第二leaf交换机更新本交换机(本设备,也即第二leaf交换机)到第一leaf交换机端到端的其中一条路径的拥塞信息,该路径的拥塞信息最终保存到本地拥塞状态表中。After the above steps are performed, finally the second leaf switch updates the congestion information of one of the paths from the switch (this device, that is, the second leaf switch) to the end-to-end of the first leaf switch, and the congestion information of this path is finally saved to the local congestion in the status table.
在一些实施例中,确定最优路径的方法,还包括:In some embodiments, the method of determining the optimal path further includes:
为相应的leaf交换机配置定时器;Configure the timer for the corresponding leaf switch;
若定时器超时,则从对应定时器的网络侧端口向对应所述定时器的第二leaf交换机发送第二数据报文;其中,第二数据报文不包括拥塞信息。If the timer times out, a second data packet is sent from the network side port corresponding to the timer to the second leaf switch corresponding to the timer, wherein the second data packet does not include congestion information.
进步地,确定最优路径之后,在步骤204之后,本申请实施例提供的确定最优路径的方法,还包括:Progressively, after the optimal path is determined, after step 204, the method for determining the optimal path provided by the embodiment of the present application further includes:
将解析出的拥塞信息删除,得到原始报文。具体地,通过剥离***的拥塞信息,即删除被***的拥塞信息,最终得到原始报文,并把原始报文发给第二leaf交换机,以通过第二leaf交换机将原始报文转发给目的服务器,从而目的服务器可以接收到原始报文。Delete the parsed congestion information to obtain the original packet. Specifically, by stripping the inserted congestion information, that is, deleting the inserted congestion information, the original packet is finally obtained, and the original packet is sent to the second leaf switch, so as to forward the original packet to the destination server through the second leaf switch , so that the destination server can receive the original message.
以下将以拥塞信息计算节点为spine交换机,对另一实施例的脊叶网络中收集拥塞信息的方法进行说明。该实施例中的脊叶网络中收集拥塞信息的方法,包括:The method for collecting congestion information in a spine-and-leaf network according to another embodiment will be described below by taking the congestion information computing node as a spine switch. The method for collecting congestion information in a spine-and-leaf network in this embodiment includes:
从网络侧端口获取第一leaf交换机发出的原始报文;Obtain the original packet sent by the first leaf switch from the network side port;
获取网络侧端口相关的拥塞信息;Obtain the congestion information related to the port on the network side;
确定第一leaf交换机的路径端口;Determine the path port of the first leaf switch;
根据路径端口将拥塞信息***到原始报文中,得到中间报文;Insert the congestion information into the original packet according to the path port to obtain the intermediate packet;
将中间报文发送给第二leaf交换机。Send the intermediate packet to the second leaf switch.
通过上述所记载的脊叶网络中收集拥塞信息的方法和确定最优路径的方法中,所有的leaf交换机最终将获取到本节点到其他leaf交换机端到端相关路径的拥塞信息,并计算出到其他leaf交换机的最小拥塞路径。第一leaf交换机向第二leaf交换机发送报文时,第一leaf交换机可以基于flowlet流选择拥塞最小的路径进行转发。在本申请实施例中,只有拥塞信息计算节点(leaf交换机或spine交换机)才需要启用拥塞信息计算功能,该功能不需要全网部署,降低了部署成本和难度;整个报文的传输过程,leaf交换机仅仅需要在原始报文内***相关拥塞信息,相比采用带外方式构造拥塞报文的方案,信息开销较小;整个拥塞信息的计算、处理、解析等操作都在交换机上进行,相比SDN方案,拥塞信息更新更快,流量负载均衡的效果更明显。Through the method for collecting congestion information and the method for determining the optimal path in the spine-and-leaf network described above, all leaf switches will finally obtain the congestion information of the end-to-end related paths from the node to other leaf switches, and calculate the number of The least congested path for other leaf switches. When the first leaf switch sends the packet to the second leaf switch, the first leaf switch may select a path with the least congestion for forwarding based on the flowlet flow. In the embodiment of the present application, only the congestion information calculation node (leaf switch or spine switch) needs to enable the congestion information calculation function, and this function does not need to be deployed on the entire network, which reduces the deployment cost and difficulty; the entire packet transmission process, leaf The switch only needs to insert the relevant congestion information into the original packet. Compared with the scheme of constructing the congestion packet in an out-of-band way, the information overhead is smaller; the calculation, processing, and analysis of the entire congestion information are all performed on the switch. With the SDN solution, the congestion information is updated faster, and the effect of traffic load balancing is more obvious.
下面将结合具体的应用场景进一步说明。The following will be further described in conjunction with specific application scenarios.
图6是本申请实施例提供的基于收集拥塞信息的方法和确定最优路径的方法的第一个应用场景的流程图,图6中以leaf交换机作为拥塞信息计算节点、采用带宽利用率作为路径拥塞情况的判断标准,进行举例说明。图6中的方法包括步骤301至步骤311。FIG. 6 is a flowchart of the first application scenario of the method for collecting congestion information and the method for determining an optimal path provided by an embodiment of the present application. In FIG. 6 , a leaf switch is used as a congestion information calculation node, and bandwidth utilization is used as a path. The criteria for judging the congestion situation are illustrated by an example. The method in FIG. 6 includes steps 301 to 311 .
结合图5,源服务器105被设置成向目的服务器106发送报文。所有leaf交换机均为拥塞信息计算节点,即第一leaf交换机101、第二leaf交换机102、第三leaf交换机107均为拥塞信息计算节点,所有的leaf交换机需要启用拥塞信息计算功能,即第一leaf交换机101、第二leaf交换机102、第三leaf交换机107需要启用拥塞信息计算功能;spine交换机仅作为转发节点、可以不启用拥塞信息计算功能,即第一spine交换机103和第二spine交换机104可以不启用拥塞信息计算功能,仅进行正常的报文转发即可。In conjunction with FIG. 5 , the source server 105 is configured to send the message to the destination server 106 . All leaf switches are congestion information calculation nodes, that is, the first leaf switch 101, the second leaf switch 102, and the third leaf switch 107 are congestion information calculation nodes. All leaf switches need to enable the congestion information calculation function, that is, the first leaf switch The switch 101, the second leaf switch 102, and the third leaf switch 107 need to enable the congestion information calculation function; the spine switch only serves as a forwarding node, and the congestion information calculation function may not be enabled, that is, the first spine switch 103 and the second spine switch 104 may not be enabled. The congestion information calculation function is enabled, and only normal packet forwarding is required.
步骤301、源服务器发出原始报文。Step 301: The source server sends an original packet.
步骤302、第一leaf交换机判断出端口是否是网络侧端口;若判断是网络侧端口,则 执行步骤303;否则,执行步骤304。Step 302: The first leaf switch determines whether the port is a network side port; if it is determined to be a network side port, then step 303 is performed; otherwise, step 304 is performed.
步骤303、第一leaf交换机***拥塞信息,得到中间报文。Step 303: The first leaf switch inserts the congestion information to obtain an intermediate packet.
步骤304、第一leaf交换机正常转发原始报文。Step 304: The first leaf switch normally forwards the original packet.
步骤305、spine交换机正常转发中间报文;具体地,spine交换机正常转发中间报文至第二leaf交换机。 Step 305, the spine switch normally forwards the intermediate packet; specifically, the spine switch normally forwards the intermediate packet to the second leaf switch.
步骤306、第二leaf交换机判断中间报文中是否存在拥塞信息,若判断中间报文中存在拥塞信息,则执行步骤307,否则,执行步骤311;具体地,第二leaf交换机接收到由spine交换机转发的中间报文后判断中间报文中是否存在拥塞信息。Step 306: The second leaf switch determines whether there is congestion information in the intermediate packet. If it is determined that there is congestion information in the intermediate packet, step 307 is performed; otherwise, step 311 is performed; After forwarding the intermediate packets, determine whether there is congestion information in the intermediate packets.
步骤307、第二leaf交换机解析出拥塞信息;具体地,第二leaf交换机解析出拥塞信息后得到第一leaf交换机的ID信息、第一leaf交换机的路径端口信息和相关路径拥塞数据、第二leaf交换机的ID信息、第二leaf交换机的路径端口信息和相关路径拥塞数据。 Step 307, the second leaf switch parses out the congestion information; specifically, after the second leaf switch parses out the congestion information, obtains the ID information of the first leaf switch, the path port information of the first leaf switch and the related path congestion data, the second leaf switch ID information of the switch, path port information of the second leaf switch, and related path congestion data.
步骤308、第二leaf交换机计算从第二leaf交换机到第一leaf交换机的端到端的其中一条路径的拥塞信息,并更新到本地拥塞状态表中。Step 308: The second leaf switch calculates the congestion information of one of the end-to-end paths from the second leaf switch to the first leaf switch, and updates it into the local congestion state table.
步骤309、删除拥塞信息、得到原始报文、将原始报文转发给目的服务器。Step 309: Delete the congestion information, obtain the original packet, and forward the original packet to the destination server.
步骤310、目的服务器接收原始报文。Step 310: The destination server receives the original message.
步骤311、第二leaf交换机正常转发中间报文。 Step 311 , the second leaf switch normally forwards the intermediate packet.
实际应用场景中,用户可以在leaf交换机上通过配置启用拥塞信息计算功能。拥塞信息与leaf交换机的网络侧端口相关,用户可以通过静态配置或者启用协议(比如LLDP协议等),从而发现在leaf交换机上哪些接口是网络侧接口。In practical application scenarios, users can enable the congestion information calculation function through configuration on the leaf switch. The congestion information is related to the network-side ports of the leaf switch. Users can statically configure or enable protocols (such as the LLDP protocol, etc.) to find out which interfaces are network-side interfaces on the leaf switch.
拥塞信息可以是网络侧端口带宽利用率信息(包括出方向或入方向)或其他任何可以标识端口或链路拥塞状态的信息。The congestion information may be network-side port bandwidth utilization information (including outbound direction or inbound direction) or any other information that can identify a port or link congestion state.
结合图5,在步骤301中,源服务105向目的服务器106发送原始报文。该原始报文在网络中,假定沿如下路径进行转发:第一leaf交换机101的Port C——>第一leaf交换机101的Port A——>第一spine交换机103的Port A——>第一spine交换机103的Port C——>第二leaf交换机102的Port A——>第二leaf交换机102的Port C。Referring to FIG. 5 , in step 301 , the source service 105 sends the original message to the destination server 106 . In the network, the original packet is assumed to be forwarded along the following path: Port C of the first leaf switch 101 -> Port A of the first leaf switch 101 -> Port A of the first spine switch 103 -> the first Port C of the spine switch 103 -> Port A of the second leaf switch 102 -> Port C of the second leaf switch 102.
通过步骤302确定网络侧端口。在步骤302中,第一leaf交换机判断此原始报文需要从网络侧端口发出,此时第一leaf交换机需要将拥塞信息***到原始报文中。例如在图5中,第一leaf交换机发现该原始报文的出端口是Port A,而Port A是网络侧端口,则第一leaf交换机需要把第一交换机的路径端口有关的拥塞信息***到原始报文中。其中,第一交换机的路径端口是指第一leaf交换机101需要***的拥塞信息相关的端口,例如图5所示,若第一leaf交换机101需要***Port A的拥塞信息,在这种情况下,Port A就是第一leaf交换机的路径端口,Port B不是第一leaf交换机的路径端口。The network side port is determined through step 302 . In step 302, the first leaf switch determines that the original packet needs to be sent from the network side port, and at this time, the first leaf switch needs to insert the congestion information into the original packet. For example, in Figure 5, the first leaf switch finds that the outgoing port of the original packet is Port A, and Port A is a network-side port, then the first leaf switch needs to insert the congestion information about the path port of the first switch into the original packet. in the message. The path port of the first switch refers to a port related to congestion information that needs to be inserted into the first leaf switch 101. For example, as shown in FIG. 5, if the first leaf switch 101 needs to insert the congestion information of Port A, in this case, Port A is the path port of the first leaf switch, and Port B is not the path port of the first leaf switch.
步骤303包括:获取网络侧端口相关的拥塞信息;该网络侧端口由步骤302确定。Step 303 includes: acquiring congestion information related to the network side port; the network side port is determined in step 302 .
步骤303还包括:根据配置策略确定路径端口。具体地,在第一leaf交换机上,用户可以配置相关策略,根据相应的配置策略确定第一leaf交换机的路径端口。Step 303 further includes: determining the path port according to the configuration policy. Specifically, on the first leaf switch, the user can configure a related policy, and determine the path port of the first leaf switch according to the corresponding configuration policy.
具体地,第一配置策略可以是:可以选择报文的出端口作为第一leaf交换机的路径端口。在图5中,第一leaf交换机101的报文出端口是Port A,我们可以把Port A作为第一leaf交换机的路径端口。Specifically, the first configuration strategy may be: the outgoing port of the packet may be selected as the path port of the first leaf switch. In FIG. 5 , the outgoing port of the packet of the first leaf switch 101 is Port A, and we can use Port A as the path port of the first leaf switch.
第二配置策略可以是:可以按照预设方式,选择某个本设备网络侧端口作为第一leaf交换机的路径端口,例如轮询方式。在图5中,第一leaf交换机101可以采用轮询方式选择本身的某一个网络侧端口作为第一leaf交换机的路径端口。在轮询方式下,若分别有第一次报文和第二次报文到来,则第一次报文到来时选择Port A作为第一leaf交换机的路径端口,第二次报文到来时可以选择Port B作为第一leaf交换机的路径端口。因为第一leaf交换机101只有Port A和Port B两个网络侧端口,所以本轮端口轮询结束。 若第三次报文到来时,第一leaf交换机101又重新选择Port A作为第一leaf交换机的路径端口,以此类推。The second configuration strategy may be: a certain network-side port of the device may be selected as the path port of the first leaf switch in a preset manner, such as a polling manner. In FIG. 5 , the first leaf switch 101 may select a certain network side port of itself as a path port of the first leaf switch in a polling manner. In the polling mode, if the first packet and the second packet arrive respectively, Port A is selected as the path port of the first leaf switch when the first packet arrives, and can be used when the second packet arrives. Select Port B as the path port of the first leaf switch. Because the first leaf switch 101 has only two network-side ports, Port A and Port B, the current round of port polling ends. If the third packet arrives, the first leaf switch 101 reselects Port A as the path port of the first leaf switch, and so on.
第三配置策略可以是:可以将第一leaf交换机101的所有网络侧端口同时作为第一leaf交换机的路径端口。在图5中,第一leaf交换机101可以把自身所有网络侧端口(Port A和Port B)同时作为第一leaf交换机的路径端口。The third configuration strategy may be: all network-side ports of the first leaf switch 101 may be used as path ports of the first leaf switch at the same time. In FIG. 5 , the first leaf switch 101 can use all its own network side ports (Port A and Port B) as the path ports of the first leaf switch at the same time.
第一leaf交换机101确定配置策略后,需要把第一leaf交换机的路径端口相关的拥塞信息***到原始报文中。After the first leaf switch 101 determines the configuration policy, it needs to insert the congestion information related to the path port of the first leaf switch into the original packet.
在一些实施例中,一个第一leaf交换机的路径端口,对应一条拥塞信息。如果有多个第一leaf交换机的路径端口,需要***多条拥塞信息。本申请实施例对拥塞信息数量不做限定。每条拥塞信息可以选择***到报文中的任何位置。In some embodiments, a path port of the first leaf switch corresponds to a piece of congestion information. If there are multiple path ports of the first leaf switch, multiple pieces of congestion information need to be inserted. This embodiment of the present application does not limit the amount of congestion information. Each piece of congestion information can optionally be inserted anywhere in the packet.
每条拥塞信息可以描述一个第一leaf交换机的路径端口的一个属性(例如端口入方向的带宽利用率)或多个属性(例如,端口的出方向或入方向的两个方向的带宽利用率信息、时间戳信息等)。具体地,拥塞信息可以描述的属性包括:第一leaf交换机的ID信息,该ID信息包括ID号码,该ID号码唯一标识了第一leaf交换机101。拥塞信息可以描述的属性还包括:第一leaf交换机的路径端口的ID信息;该ID信息唯一标识了第一leaf交换机101上的端口。拥塞信息可以描述的属性还包括:拥塞属性信息;该拥塞属性信息是指与第一leaf交换机上路径端口相关的一个或多个拥塞属性,例如:该端口入方向的带宽利用率是第一个拥塞属性,该原始报文进入第一leaf交换机101的时间戳信息是第二个拥塞属性。本申请实施例中,拥塞属性必须包括第一leaf交换机的路径端口入方向的带宽利用率,该带宽利用率信息表示从spine交换机到leaf交换机的某条下行链路的拥塞状态,例如在图5中,如Port A是第一leaf交换机的路径端口,该端口的拥塞属性中包括了入方向的带宽利用率,该带宽利用率表示第一spine 103交换机的Port A到第一leaf交换机101的Port A的这条下行链路的拥塞情况。Each piece of congestion information can describe one attribute of the path port of the first leaf switch (such as the bandwidth utilization in the inbound direction of the port) or multiple attributes (for example, the bandwidth utilization information in the outgoing direction or the inbound direction of the port in both directions) , timestamp information, etc.). Specifically, the attributes that the congestion information can describe include: ID information of the first leaf switch, where the ID information includes an ID number, and the ID number uniquely identifies the first leaf switch 101 . The attributes that can be described by the congestion information further include: ID information of the path port of the first leaf switch; the ID information uniquely identifies the port on the first leaf switch 101 . The attributes that can be described by the congestion information also include: congestion attribute information; the congestion attribute information refers to one or more congestion attributes related to the path port on the first leaf switch, for example: the bandwidth utilization in the inbound direction of the port is the first Congestion attribute, the timestamp information of the original packet entering the first leaf switch 101 is the second congestion attribute. In this embodiment of the present application, the congestion attribute must include the bandwidth utilization in the inbound direction of the path port of the first leaf switch. The bandwidth utilization information indicates the congestion state of a certain downlink from the spine switch to the leaf switch. For example, in Figure 5 , if Port A is the path port of the first leaf switch, the congestion attribute of the port includes the bandwidth utilization in the inbound direction, and the bandwidth utilization represents the Port A of the first spine 103 switch to the Port of the first leaf switch 101 Congestion of this downlink of A.
从而中间报文从第一leaf交换机101的Port A发出。Therefore, the intermediate packet is sent from Port A of the first leaf switch 101.
在一具体的实施例中,对第一leaf交换机的处理过程进行说明。请参阅图5,第一leaf交换机101从本设备的Port C收到报文,从本设备的Port A发出报文。第一leaf交换机101发现该报文从网络侧端口(Port A)发出。此时,第一leaf交换机101通过本地配置的策略判定,Port A是第一leaf交换机的路径端口,则第一leaf交换机101需要将Port A相关的拥塞信息***到原始报文中。其中,第一leaf交换机101***的拥塞信息内容可以包括:第一leaf交换机的ID号码。此外,第一leaf交换机101***的拥塞信息内容还可以包括:第一leaf交换机的路径端口号信息,例如,第一leaf交换机101的Port A的ID号。进一步地,第一leaf交换机101***的拥塞信息内容还可以包括:第一leaf交换机的路径端口相关的拥塞属性,例如第一leaf交换机101的Port A入方向的带宽利用率,该带宽利用率表示第一spine交换机103的Port A到第一leaf交换机101的Port A的下行链路的拥塞情况。In a specific embodiment, the processing procedure of the first leaf switch is described. Referring to FIG. 5 , the first leaf switch 101 receives a packet from Port C of the device, and sends a packet from Port A of the device. The first leaf switch 101 finds that the packet is sent from a network-side port (Port A). At this time, the first leaf switch 101 determines through a locally configured policy that Port A is the path port of the first leaf switch, and the first leaf switch 101 needs to insert the congestion information related to Port A into the original packet. The content of the congestion information inserted by the first leaf switch 101 may include: the ID number of the first leaf switch. In addition, the content of the congestion information inserted by the first leaf switch 101 may further include: path port number information of the first leaf switch, for example, the ID number of Port A of the first leaf switch 101. Further, the content of the congestion information inserted by the first leaf switch 101 may also include: congestion attributes related to the path ports of the first leaf switch, such as the bandwidth utilization in the inbound direction of Port A of the first leaf switch 101, the bandwidth utilization representing Congestion of the downlink from Port A of the first spine switch 103 to Port A of the first leaf switch 101.
步骤305中,第一spine交换机103从本设备的Port A收到中间报文,进行正常转发处理,从本设备的Port C发出中间报文。In step 305, the first spine switch 103 receives the intermediate packet from Port A of the device, performs normal forwarding processing, and sends the intermediate packet from Port C of the device.
步骤306中,第二leaf交换机102从本设备的Port A收到中间报文;识别该中间报文的信息,如果发现该中间报文中包含有***的1条或多条拥塞信息,则执行步骤307。在本实例中,该拥塞信息由第一leaf交换机101***,第二leaf交换机103解析该拥塞信息。In step 306, the second leaf switch 102 receives the intermediate message from Port A of the device; identifies the information of the intermediate message, and if it is found that the intermediate message contains 1 or more pieces of congestion information inserted, execute Step 307. In this example, the congestion information is inserted by the first leaf switch 101, and the second leaf switch 103 parses the congestion information.
在步骤307中,第二leaf交换机102解析出拥塞信息;具体地,第二leaf交换机解析出拥塞信息后得到第一leaf交换机的ID信息、第一leaf交换机的路径端口信息和相关路径拥塞数据、第二leaf交换机的ID信息、第二leaf交换机的路径端口信息和相关路径拥塞数据。具体地,步骤307包括:In step 307, the second leaf switch 102 parses out the congestion information; specifically, after the second leaf switch parses out the congestion information, it obtains the ID information of the first leaf switch, the path port information of the first leaf switch, and related path congestion data, ID information of the second leaf switch, path port information of the second leaf switch, and related path congestion data. Specifically, step 307 includes:
根据中间报文的源IP或第一leaf交换机的ID号信息获取第一leaf交换机的ID信息;Obtain the ID information of the first leaf switch according to the source IP of the intermediate packet or the ID number information of the first leaf switch;
获取第一交换机的路径端口号信息;Obtain the path port number information of the first switch;
获取第一交换机的端口相关的拥塞属性;在本实例中,获取到第一leaf交换机相关端口的带宽利用率信息。Obtain the congestion attribute related to the port of the first switch; in this example, obtain the bandwidth utilization information of the port related to the first leaf switch.
步骤307还包括:Step 307 also includes:
获取相关第二交换机ID,及相关路径端口的拥塞信息;具体地,包括:Acquire the relevant second switch ID and the congestion information of the relevant path ports; specifically, including:
根据报文的入端口,获取到第二leaf交换机的路径端口的端口号信息;在图5中,该端口号就是第二leaf交换机102交换机的Port A的ID号;According to the ingress port of the message, obtain the port number information of the path port of the second leaf switch; in Figure 5, the port number is the ID number of Port A of the second leaf switch 102 switch;
根据第二leaf交换机的路径端口的端口号信息,获取本地计算的该端口出方向的带宽利用率信息。该带宽利用率信息标识了从leaf到spine某条下行链路的拥塞状态。比如在图三中,该带宽利用率表示leaf 3交换机Port A到spine1交换机Port C这条上行链路的拥塞情况。According to the port number information of the path port of the second leaf switch, the locally calculated bandwidth utilization information of the port in the outbound direction is obtained. The bandwidth utilization information identifies the congestion state of a downlink from leaf to spine. For example, in Figure 3, the bandwidth utilization indicates the congestion of the uplink from leaf 3 switch Port A to spine 1 switch Port C.
在步骤308中,比较第一leaf交换机的路径端口入方向带宽利用率信息和第二leaf交换机的路径端口出方向带宽利用率信息,获取最大值(表明拥塞最严重的值),并把该值作为第二leaf交换机103到第一leaf交换机101的某条路径的拥塞值。该路径由第二leaf交换机的路径端口(路径首节点)与第一leaf交换机的路径端口(路径尾节点)唯一确定;In step 308, compare the bandwidth utilization information of the inbound direction of the path port of the first leaf switch with the bandwidth utilization information of the outbound direction of the path port of the second leaf switch, obtain the maximum value (the value indicating the most serious congestion), and set the value As the congestion value of a certain path from the second leaf switch 103 to the first leaf switch 101 . The path is uniquely determined by the path port (path head node) of the second leaf switch and the path port (path tail node) of the first leaf switch;
综合第一交换机的ID信息和路径端口号信息、第二交换机的ID信息和路径端口号信息,能够索引到本地拥塞状态表中对应的某条表项中,本地拥塞状态表可以参照表1所示。表1的列坐标表示网络中的所有leaf交换机(例如第一leaf交换机101、第二leaf交换机102、第三leaf交换机107),表1的横坐标表示本leaf交换机(例如第一leaf交换机101)到其他leaf交换机的不同路径,该路径可以用路径首节点(例如第一leaf交换机101的Port A)和路径尾节点(例如第二leaf交换机102的Port A)唯一标识。路径的拥塞状态可以用路径首节点出方向的端口利用率和路径尾节点入方向的端口利用率来表示。在本申请实施例中,选择两个端口利用率的最大值作为整条路径的拥塞信息。例如在图5中,第二leaf交换机102到第一leaf交换机101的一条路径,该路径的路径首节点是第二leaf交换机102的Port A,该路径的路径尾节点是第一leaf交换机101的Port A。假定第二leaf交换机102的Port A(路径首节点出方向)的带宽利用率是0.3,第一leaf交换机101的Port A(路径尾节点入方向)的带宽利用率是0.7,则该路径的总体带宽利用率是0.7,即取0.3和0.7中的最大值0.7。By synthesizing the ID information and path port number information of the first switch, and the ID information and path port number information of the second switch, it can be indexed into a corresponding entry in the local congestion state table. For the local congestion state table, refer to Table 1. Show. The column coordinates of Table 1 represent all leaf switches in the network (for example, the first leaf switch 101, the second leaf switch 102, and the third leaf switch 107), and the abscissa of Table 1 represents the current leaf switch (for example, the first leaf switch 101) For different paths to other leaf switches, the path can be uniquely identified by a path head node (eg, Port A of the first leaf switch 101 ) and a trailing node (eg, Port A of the second leaf switch 102 ). The congestion state of a path can be represented by the port utilization in the outbound direction of the first node of the path and the port utilization in the inbound direction of the trailing node. In this embodiment of the present application, the maximum value of the utilization ratio of the two ports is selected as the congestion information of the entire path. For example, in FIG. 5 , for a path from the second leaf switch 102 to the first leaf switch 101 , the first node of the path is Port A of the second leaf switch 102 , and the tail node of the path is the first node of the leaf switch 101 . Port A. Assuming that the bandwidth utilization rate of Port A (the outgoing direction of the path head node) of the second leaf switch 102 is 0.3, and the bandwidth utilization rate of Port A (the inbound direction of the trailing node) of the first leaf switch 101 is 0.7, then the overall The bandwidth utilization rate is 0.7, that is, the maximum value of 0.3 and 0.7 is 0.7.
从而,第二leaf交换机102更新相关的路径信息。如果有多条拥塞信息,第二leaf交换机102可以按照上述步骤逐条解析拥塞信息,最终获得第二leaf交换机102到第一leaf交换机101的多条端到端的路径。上述多条路径的状态信息更新到本地的路径拥塞状态表中。Thus, the second leaf switch 102 updates the related path information. If there are multiple pieces of congestion information, the second leaf switch 102 may analyze the congestion information piece by piece according to the above steps, and finally obtain multiple end-to-end paths from the second leaf switch 102 to the first leaf switch 101 . The state information of the above-mentioned multiple paths is updated into the local path congestion state table.
在步骤309中,第二leaf交换机去除中间报文中的拥塞信息,以将拥塞信息删除,从而可以得到原始报文,并将原始报文通过Port C转发给目的服务器,从而目的服务器106可以接收原始报文。In step 309, the second leaf switch removes the congestion information in the intermediate message to delete the congestion information, thereby obtaining the original message, and forwarding the original message to the destination server through Port C, so that the destination server 106 can receive original message.
具体地,对第二leaf交换机102的处理过程进行举例说明。在图5中,第二leaf交换机102从Port A收到中间报文,从Port C发出删除拥塞信息的原始报文。第二leaf交换机102识别中间报文的相关信息,若发现该中间报文中存在***的1条或多条拥塞信息,则解析拥塞信息,具体地:第二leaf交换机102解析拥塞信息,第二leaf交换机ID、Port A的ID值及Port A入方向的带宽利用率信息,假定该值是0.3;第二leaf交换机102获取自己的交换机ID、Port A的ID值,Port A(报文入端口)出方向的带宽利用率信息,假定该值是0.7;第二leaf交换机102计算从第二leaf交换机102到第一leaf交 换机101的某条路径的拥塞值:比较第二leaf交换机102的Port A出方向的带宽利用率信息(0.7)和第二leaf交换机102的Port A入方向的带宽利用率信息(0.3),最终得出:第二leaf交换机102到第一leaf交换机101的路径拥塞值是0.7。该路径的路径首节点是第二leaf交换机102的Port A,路径尾节点是第一leaf交换机101的Port A;第二leaf交换机102更新本地的路径拥塞状态表:根据第一leaf交换机ID信息、Port A的ID信息、第二leaf交换机ID信息、第二leaf交换机102的Port A的ID信息,能够在本地路径拥塞信息表中的找到相关路径项,更新该路径的拥塞信息为0.7。Specifically, the processing procedure of the second leaf switch 102 is exemplified. In FIG. 5 , the second leaf switch 102 receives an intermediate message from Port A, and sends an original message to delete congestion information from Port C. The second leaf switch 102 identifies the relevant information of the intermediate packet, and parses the congestion information if one or more pieces of congestion information are inserted into the intermediate packet, specifically: the second leaf switch 102 parses the congestion information, and the second leaf switch 102 parses the congestion information, The leaf switch ID, the ID value of Port A, and the bandwidth utilization information in the inbound direction of Port A are assumed to be 0.3; the second leaf switch 102 obtains its own switch ID, the ID value of Port A, and Port A (packet inbound port) ) bandwidth utilization information of the outbound direction, assuming that the value is 0.7; the second leaf switch 102 calculates the congestion value of a certain path from the second leaf switch 102 to the first leaf switch 101: compare Port A of the second leaf switch 102 The bandwidth utilization information (0.7) in the outbound direction and the bandwidth utilization information (0.3) in the inbound direction of Port A of the second leaf switch 102, it is finally obtained that the path congestion value from the second leaf switch 102 to the first leaf switch 101 is 0.7. The path head node of the path is Port A of the second leaf switch 102, and the trailing node of the path is Port A of the first leaf switch 101; the second leaf switch 102 updates the local path congestion state table: according to the first leaf switch ID information, The ID information of Port A, the ID information of the second leaf switch, and the ID information of Port A of the second leaf switch 102 can find the relevant path entry in the local path congestion information table, and update the congestion information of the path to 0.7.
实际场景中,如果第一leaf交换机101下的服务器(例如源服务器105)向第二leaf交换机102下的服务器(例如目的服务器106)发送报文时,部分网络流量沿其他路径转发。如图7所示,源服务器105向目的服务器106发送报文,该报文假定沿以下路径传输:第一leaf交换机101的Port C——>第一leaf交换机101的Port B——>第一spine交换机103的Port A——>第二spine交换机104的Port C——>第二leaf交换机102的Port B——>第二leaf交换机102的Port C。在这种情况下:In an actual scenario, if a server (eg, source server 105 ) under the first leaf switch 101 sends a packet to a server (eg, destination server 106 ) under the second leaf switch 102 , part of the network traffic is forwarded along other paths. As shown in FIG. 7 , the source server 105 sends a message to the destination server 106, and the message is assumed to be transmitted along the following path: Port C of the first leaf switch 101—> Port B of the first leaf switch 101—> the first Port A of the spine switch 103 -> Port C of the second spine switch 104 -> Port B of the second leaf switch 102 -> Port C of the second leaf switch 102. under these circumstances:
第一leaf交换机101重复执行上述步骤302至步骤304的流程,spine交换机104重复执行步骤上述步骤305的流程,第二leaf交换机102重复执行上述步骤306至步骤309的流程,最终第二leaf交换机102获得本设备到第一leaf交换机101的另一条端到端路径的拥塞状态。该路径的路径首节点是第二leaf交换机102的Port B,路径的尾节点是第一leaf交换机101的Port B。The first leaf switch 101 repeatedly executes the process from steps 302 to 304, the spine switch 104 repeatedly executes the process from step 305, the second leaf switch 102 repeatedly executes the process from step 306 to step 309, and finally the second leaf switch 102 Obtain the congestion status of the other end-to-end path from the device to the first leaf switch 101 . The path head node of the path is Port B of the second leaf switch 102, and the tail node of the path is Port B of the first leaf switch 101.
上述实施例中,只要网络上存在持续的流量,并且该流量从第一leaf交换机101到第二leaf交换机102之间沿所有的路径传输,则第二leaf交换机102可以获得并能不断更新本设备到第一leaf交换机101相关端到端路径的拥塞状态。In the above embodiment, as long as there is continuous traffic on the network, and the traffic is transmitted along all paths from the first leaf switch 101 to the second leaf switch 102, the second leaf switch 102 can obtain and continuously update the device. Congestion status of the associated end-to-end path to the first leaf switch 101.
本申请实施例是基于流的多路径转发,所有leaf交换机需具备基于flowlet流(即小流)的多路径转发功能。第一leaf交换机101向第二leaf交换机102发送报文时,可以基于flowlet流选择拥塞最小的路径进行转发。具体地:This embodiment of the present application is flow-based multi-path forwarding, and all leaf switches need to have a multi-path forwarding function based on flowlet flows (ie, small flows). When the first leaf switch 101 sends the packet to the second leaf switch 102, it may select a path with the least congestion for forwarding based on the flowlet flow. specifically:
当第一leaf交换机101收到报文时,按照预设规则计算报文的流标识,通过上述流标识索引到流转发表中的相关表项,该预设规则可以是:通过提取报文的五元组计算的哈希值(hash值)作为流标识信息;When the first leaf switch 101 receives the packet, it calculates the flow identifier of the packet according to a preset rule, and indexes the relevant entry in the flow forwarding table through the aforementioned flow identifier. The preset rule may be: by extracting five The hash value (hash value) calculated by the tuple is used as the stream identification information;
如果该表项无效,则:If the entry is invalid, then:
第一leaf交换机101根据目的地址再查找路径拥塞状态表;该拥塞状态表中保存了第一leaf交换机101到第二leaf交换机102(可以有报文的目的地址计算出)所有端到端路径的拥塞信息;路径拥塞状态表的内容可以参考上述表1的描述;The first leaf switch 101 searches the path congestion state table again according to the destination address; the congestion state table stores all the end-to-end paths of the first leaf switch 101 to the second leaf switch 102 (which can be calculated from the destination address of the message). Congestion information; the content of the path congestion status table can refer to the description in Table 1 above;
根据路径拥塞状态表中的信息,第一leaf交换机101找到拥塞最小的出端口,并把该出端口写入流转发表中,同时设置流转发表有效;According to the information in the path congestion state table, the first leaf switch 101 finds the outgoing port with the least congestion, writes the outgoing port into the flow forwarding table, and sets the flow forwarding table to be valid;
第一leaf交换机101把报文从该端口发出;The first leaf switch 101 sends the packet from the port;
如果该表项有效,则:leaf交换机直接根据流转发表标识的出端口转发报文。If this entry is valid, the leaf switch directly forwards packets according to the outbound port identified by the flow forwarding table.
本申请实施的流转发表具有老化机制。如果流转发表中某一表项,T时间内没有报文刷新,则该表项设置为无效。The flow forwarding table implemented in this application has an aging mechanism. If an entry in the flow forwarding table is not refreshed within T time, the entry is set to be invalid.
本申请实施基于两层spine-leaf网络,利用leaf交换机或spine交换机计算网络中的拥塞信息,并把拥塞信息***到正常的原始报文中、以得到中间报文,第二leaf交换机收到中间报文、并从中间报文解析拥塞信息,最终所有的leaf交换机将获取到本节点到其他leaf交换机相关链路的拥塞信息。leaf交换机基于上述拥塞信息,计算端到端最小拥塞路径并更新自己的转发表项。在流量转发时,leaf交换机基于flowLet流,实现流量按照最小拥塞路径转发,最终实现提升整个网络的吞吐率。The implementation of this application is based on a two-layer spine-leaf network. Leaf switches or spine switches are used to calculate the congestion information in the network, and the congestion information is inserted into the normal original packets to obtain intermediate packets. The second leaf switch receives the intermediate packets. packets, and parse the congestion information from the intermediate packets. Finally, all leaf switches will obtain the congestion information of the links between the node and other leaf switches. Based on the above congestion information, the leaf switch calculates the end-to-end least congested path and updates its forwarding entry. During traffic forwarding, the leaf switch implements traffic forwarding according to the path of least congestion based on the flowLet flow, and ultimately improves the throughput of the entire network.
图8是本申请实施例提供的基于收集拥塞信息的方法和确定最优路径的方法的第二个应用场景的新增流程图,图8中以leaf交换机作为拥塞信息计算节点进行举例说明,并 且,以本机的每个交换机网络侧端口+每个leaf交换机为维度,配置定时器。与图4不同的是,图8所示的用于负载均衡的最优路径确定方法在图4的基础上至少新增步骤401至步骤403。FIG. 8 is a new flowchart of the second application scenario based on the method for collecting congestion information and the method for determining an optimal path provided by the embodiment of the present application. In FIG. , and configure the timer based on the network side port of each switch of the local machine + each leaf switch as the dimension. Different from FIG. 4 , the optimal path determination method for load balancing shown in FIG. 8 adds at least steps 401 to 403 on the basis of FIG. 4 .
在第二个应用场景的用于负载均衡的最优路径确定方法中,为了防止第一leaf交换机101下的源服务器105长时间没有向第二leaf交换机102下的目的服务器106发送报文,造成第二leaf交换机102无法及时更新本节点到第一leaf交换机101路径的拥塞情况。在本实施例中,基于上述第一个应用场景的实施例,可以在leaf交换机上,以本机交换机的每个网络侧端口+其他leaf交换机为维度,配置1个定时器。当定时器超时时,第一leaf交换机101主动从相关网络侧端口向相关leaf交换机发送拥塞报文。In the method for determining the optimal path for load balancing in the second application scenario, in order to prevent the source server 105 under the first leaf switch 101 from not sending packets to the destination server 106 under the second leaf switch 102 for a long time, resulting in The second leaf switch 102 cannot update the congestion status of the path from the node to the first leaf switch 101 in time. In this embodiment, based on the embodiment of the first application scenario above, a timer may be configured on the leaf switch with each network port of the local switch + other leaf switches as a dimension. When the timer expires, the first leaf switch 101 actively sends a congestion packet from the relevant network side port to the relevant leaf switch.
关于计算拥塞信息的流程,可以参照上述第一个应用场景的实施例的步骤,本实施例不再赘述。For the process of calculating the congestion information, reference may be made to the steps in the embodiment of the first application scenario above, which will not be repeated in this embodiment.
在第二个应用场景的最优路径确定方法不同于第一个应用场景的最优路径确定方法的是:第一leaf交换机101的处理上,通过配置相应定时器来确定是否主动发送报文。The difference between the optimal path determination method in the second application scenario and the optimal path determination method in the first application scenario is that in the processing of the first leaf switch 101 , whether to actively send a packet is determined by configuring a corresponding timer.
步骤401、给相应leaf交换机配置定时器。 Step 401 , configure a timer for the corresponding leaf switch.
具体地,在leaf交换机上,以本机交换机的每个网络侧端口,其他每个leaf交换机这2个参数为维度,配置1个定时器。举例来说,在图5的网络拓扑中,在第一leaf交换机101上,需要配置4个定时器,该定时器与本机网络侧接口与其他leaf交换机的关系,可以参考下表2所示,定时器1对应第一leaf交换机101的Port A,第三leaf交换机107;定时器2对应第一leaf交换机101的Port A,第二leaf交换机102;定时器3对应第一leaf交换机101的Port A,第三leaf交换机107;定时器4对应第一leaf交换机101的Port B,第二leaf交换机102。Specifically, on the leaf switch, one timer is configured with two parameters of each network port of the local switch and each other leaf switch as dimensions. For example, in the network topology of FIG. 5 , on the first leaf switch 101, four timers need to be configured. The relationship between the timers and the local network side interface and other leaf switches can be referred to as shown in Table 2 below. , Timer 1 corresponds to Port A of the first leaf switch 101, and the third leaf switch 107; Timer 2 corresponds to Port A of the first leaf switch 101, and the second leaf switch 102; Timer 3 corresponds to the Port of the first leaf switch 101 A, the third leaf switch 107; Timer 4 corresponds to Port B of the first leaf switch 101, and the second leaf switch 102.
定时器timer 网络侧端口network side port 第二leaf交换机ID信息Second leaf switch ID information
定时器1Timer 1 Port APort A 第三leaf交换机107 Third leaf switch 107
定时器2timer 2 Port APort A 第二leaf交换机102 Second leaf switch 102
定时器3timer 3 Port BPort B 第三leaf交换机107 Third leaf switch 107
定时器4timer 4 Port BPort B 第二leaf交换机102 Second leaf switch 102
表2Table 2
步骤402、将相应定时器进行清零。Step 402: Clear the corresponding timer to zero.
具体地,本Leaf交换机从自身某个网络侧端口向其他leaf交换机发出报文时,则相关定时器进行清零并重新开始计时。比如在表2中,第一leaf交换机101从本设备的Port A向第二leaf交换机102发出报文,则对定时器2进行清零并重新开始计时。Specifically, when the leaf switch sends a packet from a certain network port of itself to other leaf switches, the related timer is cleared and the timing is restarted. For example, in Table 2, the first leaf switch 101 sends a packet from Port A of the device to the second leaf switch 102, then the timer 2 is cleared and the timing is restarted.
步骤403、若定时器超时,本leaf交换机向相应的其他leaf交换发送第一数据报文。Step 403: If the timer times out, the leaf switch sends the first data packet to the corresponding other leaf switches.
具体地,若定时器超时,本leaf交换机需要主动通过定时器相关的网络侧端口向定时器相关的leaf交换机发送第一数据报文。在一实施例中,第一数据报文的目的IP可以是相关leaf交换机自身的IP地址。第一数据报文至少包括定时器相关端口入方向的拥塞信息;第一数据报文的其他内容可以自己定义,但至少包括定时器相关端口入方向的拥塞信息。Specifically, if the timer times out, the leaf switch needs to actively send the first data packet to the leaf switch related to the timer through the network side port related to the timer. In an embodiment, the destination IP of the first data packet may be the IP address of the relevant leaf switch itself. The first data packet includes at least the congestion information of the inbound direction of the timer-related port; other content of the first data packet can be defined by yourself, but at least includes the congestion information of the inbound direction of the timer-related port.
示例地,结合表2,在一实施例中,以表2所示的内容配置定时器,其中,若定时器2超时,因定时器2与第二leaf交换机102,和自身的网络侧接口Port A相关,所以第一leaf交换机101主动从本设备的Port A向第二leaf交换机102发送第一数据报文。该第一数据报文的目的IP地址是第二leaf交换机102的自身IP,第一数据报文中需要包括第一leaf交换机101的Port A入方向的拥塞信息。Exemplarily, in conjunction with Table 2, in an embodiment, the timer is configured with the content shown in Table 2, wherein, if the timer 2 times out, because the timer 2 and the second leaf switch 102, and its own network side interface Port A is related, so the first leaf switch 101 actively sends the first data packet from Port A of the device to the second leaf switch 102. The destination IP address of the first data packet is the own IP of the second leaf switch 102, and the first data packet needs to include the congestion information of the Port A inbound direction of the first leaf switch 101.
图9是本申请实施例提供的基于收集拥塞信息的方法和确定最优路径的方法的第三个应用场景的流程图,图9中以spine交换机作为拥塞信息计算节点、采用带宽利用率作为路径拥塞情况的判断标准进行举例说明。与图4不同的是,图9所示的用于负载均衡的最 优路径确定方法至少包括步骤501至步骤510。FIG. 9 is a flowchart of a third application scenario based on the method for collecting congestion information and the method for determining an optimal path provided by an embodiment of the present application. In FIG. 9, a spine switch is used as a congestion information calculation node, and bandwidth utilization is used as a path The criteria for judging the congestion situation are illustrated by an example. Different from FIG. 4 , the optimal path determination method for load balancing shown in FIG. 9 includes at least steps 501 to 510.
第三个应用场景的用于负载均衡的最优路径确定方法中,结合图5,源服务器105被设置成向目的服务器106发送报文。所有spine交换机均为拥塞信息计算节点、所有的leaf交换机需要启用拥塞信息计算功能,即第一spine交换机103和第二spine交换机104均为拥塞信息计算节点,第一spine交换机103和第二spine交换机104均需要启用拥塞信息计算功能;leaf交换机仅作为转发节点、可以不启用拥塞信息计算功能,即第一leaf交换机101、第二leaf交换机102、第三leaf交换机107可以不启用拥塞信息计算功能,仅进行正常的报文转发即可。In the method for determining the optimal path for load balancing in the third application scenario, with reference to FIG. 5 , the source server 105 is configured to send packets to the destination server 106 . All spine switches are congestion information calculation nodes, and all leaf switches need to enable the congestion information calculation function, that is, the first spine switch 103 and the second spine switch 104 are congestion information calculation nodes, the first spine switch 103 and the second spine switch. 104 all need to enable the congestion information calculation function; the leaf switch only acts as a forwarding node, and the congestion information calculation function may not be enabled, that is, the first leaf switch 101, the second leaf switch 102, and the third leaf switch 107 may not enable the congestion information calculation function, Only normal packet forwarding is required.
步骤501、源服务器发出原始报文。Step 501: The source server sends an original packet.
步骤502、第一leaf交换机正常转发原始报文。Step 502: The first leaf switch normally forwards the original packet.
步骤503、spine交换机获取拥塞相关数据。 Step 503, the spine switch obtains congestion-related data.
步骤504、spine交换机根据拥塞相关数据计算拥塞信息。Step 504: The spine switch calculates congestion information according to the congestion-related data.
步骤505、spine交换机***拥塞信息,得到中间报文。 Step 505, the spine switch inserts the congestion information to obtain intermediate packets.
步骤506、第二leaf交换机判断中间报文中是否存在拥塞信息,若判断中间报文中存在拥塞信息,则执行步骤307,否则,执行步骤311;具体地,第二leaf交换机接收到由spine交换机转发的中间报文后判断中间报文中是否存在拥塞信息。Step 506: The second leaf switch determines whether there is congestion information in the intermediate packet. If it is determined that there is congestion information in the intermediate packet, step 307 is performed; otherwise, step 311 is performed; After forwarding the intermediate packets, determine whether there is congestion information in the intermediate packets.
步骤507、第二leaf交换机解析出拥塞信息;具体地,第二leaf交换机解析出拥塞信息后得到第一leaf交换机的ID信息、第一leaf交换机的路径端口信息和相关路径拥塞数据、第二leaf交换机的ID信息、第二leaf交换机的路径端口信息和相关路径拥塞数据。Step 507: The second leaf switch parses out the congestion information; specifically, after the second leaf switch parses out the congestion information, it obtains the ID information of the first leaf switch, the path port information of the first leaf switch and related path congestion data, the second leaf switch ID information of the switch, path port information of the second leaf switch, and related path congestion data.
步骤508、第二leaf交换机计算从第二leaf交换机到第一leaf交换机的端到端的其中一条路径的拥塞信息,并更新到本地拥塞状态表中。Step 508: The second leaf switch calculates the congestion information of one of the end-to-end paths from the second leaf switch to the first leaf switch, and updates it to the local congestion state table.
步骤509、删除拥塞信息、得到原始报文、将原始报文转发给目的服务器。Step 509: Delete the congestion information, obtain the original packet, and forward the original packet to the destination server.
步骤510、目的服务器接收原始报文。Step 510: The destination server receives the original message.
步骤511、第二leaf交换机正常转发中间报文。Step 511: The second leaf switch normally forwards the intermediate packet.
第三个应用场景的用于负载均衡的最优路径确定方法中拥塞信息计算功能的启动类似于第一个应用场景的用于负载均衡的最优路径确定方法的拥塞信息计算功能的启动,不同于第一个应用场景的是,在第三个应用场景中由spine交换机完成拥塞信息计算功能,参照图6的所示的原理,在本实施例中,用户可以在spine交换机上通过配置启用拥塞信息计算功能。The activation of the congestion information calculation function in the optimal path determination method for load balancing in the third application scenario is similar to the activation of the congestion information calculation function in the optimal path determination method for load balancing in the first application scenario. In the first application scenario, in the third application scenario, the spine switch completes the congestion information calculation function. Referring to the principle shown in Figure 6, in this embodiment, the user can enable congestion through configuration on the spine switch. Information computing function.
拥塞信息可以是网络侧端口带宽利用率信息(包括出方向或入方向)或其他任何可以标识端口或链路拥塞状态的信息。The congestion information may be network-side port bandwidth utilization information (including outbound direction or inbound direction) or any other information that can identify a port or link congestion state.
步骤501中,源服务器105向目的服务器106发送的原始报文,假定沿如下路径进行转发,如图5所示:第一leaf交换机101的Port C——>第一leaf交换机101的Port A——>第一spine交换机103的Port A——>第一spine交换机103的Port C——>第二leaf交换机102的Port A——>第二leaf交换机102的Port C。In step 501, the original message sent by the source server 105 to the destination server 106 is assumed to be forwarded along the following path, as shown in Figure 5: Port C of the first leaf switch 101—> Port A of the first leaf switch 101— -> Port A of the first spine switch 103 -> Port C of the first spine switch 103 -> Port A of the second leaf switch 102 -> Port C of the second leaf switch 102.
步骤502中,第一leaf交换机101正常转发原始报文。如图5所示,第一leaf交换机101从本设备的Port C收到原始报文,进行正常转发,从本设备的Port A发出。In step 502, the first leaf switch 101 normally forwards the original packet. As shown in FIG. 5 , the first leaf switch 101 receives the original packet from Port C of the device, forwards it normally, and sends it from Port A of the device.
步骤503中,拥塞相关数据包括:原始报文的入端口信息和出端口信息。spine交换机正常转发原始报文,可以获取原始报文的入端口信息和出端口信息。在本实例中,入端口是指报文进入spine交换机的端口,出端口是指报文离开spine交换机的端口。在图5中,spine交换机的入端口是Port A,spine交换机的出端口是Port C。In step 503, the congestion-related data includes: ingress port information and egress port information of the original packet. The spine switch normally forwards the original packet, and can obtain the ingress port information and egress port information of the original packet. In this example, the ingress port refers to the port through which packets enter the spine switch, and the egress port refers to the port through which packets leave the spine switch. In Figure 5, the ingress port of the spine switch is Port A, and the egress port of the spine switch is Port C.
拥塞相关数据还包括:第一leaf交换机的ID信息和相关路径端口信息。步骤503包括:The congestion related data further includes: ID information of the first leaf switch and related path port information. Step 503 includes:
获取第一leaf交换机的ID信息和相关路径端口信息;在本实施例中,第一leaf交 换机的路径端口是指:与spine交换机报文的入端口直接相连的其中一个第一leaf交换机的端口;其中,spine交换机可以通过静态配置或者动态协议(比如LLDP协议)获取该信息;在图5中,第一spine交换机103的Port A是报文的入端口,第一leaf交换机101的Port A就是第一leaf交换机的路径端口。Obtain the ID information and related path port information of the first leaf switch; in this embodiment, the path port of the first leaf switch refers to: one of the ports of the first leaf switch directly connected to the ingress port of the spine switch packet; The spine switch can obtain the information through static configuration or dynamic protocol (such as LLDP protocol); in FIG. 5, Port A of the first spine switch 103 is the ingress port of the packet, and Port A of the first leaf switch 101 is the first port A of the first leaf switch 101. Path port of a leaf switch.
拥塞相关数据还包括:报文的入端口出方向的拥塞值。步骤503还包括:The congestion-related data also includes: the congestion value of the inbound and outbound directions of the packet. Step 503 also includes:
获取报文的入端口出方向的拥塞值;spine交换机根据报文的入端口ID号,并结合本地拥塞信息计算结果,可以获取到该端口出方向的拥塞信息,该拥塞信息表明了一条从spine交换机到leaf交换机的下行链路的拥塞情况;在本实施例中,该信息可以是该端口出方向的带宽利用率值;在图5中,Port A是该报文在第一spine交换机103上的入端口,该端口出方向的拥塞信息就是指第一spine交换机103的Port A到第一leaf交换机101的Port A下行链路的拥塞信息。Obtain the congestion value of the inbound port and outbound direction of the packet; the spine switch can obtain the congestion information of the outbound direction of the port according to the inbound port ID number of the packet and the calculation result combined with the local congestion information. The congestion situation of the downlink from the switch to the leaf switch; in this embodiment, the information can be the bandwidth utilization value of the port in the outbound direction; in FIG. 5, Port A is the message on the first spine switch 103 The congestion information in the outbound direction of the port refers to the congestion information of the downlink from Port A of the first spine switch 103 to Port A of the first leaf switch 101.
拥塞相关数据还包括:第二leaf交换机的ID信息及相关路径端口信息。步骤503还包括:The congestion-related data further includes: ID information of the second leaf switch and related path port information. Step 503 also includes:
获取第二leaf交换机的ID信息及相关路径端口信息;在本实施例中,第二leaf交换机的路径端口是指:与spine交换机报文的出端口直接连接的第二leaf交换机102的端口;spine交换机可以通过静态配置或者动态协议(比如LLDP协议)获取该信息;在图5中,第一spine交换机103的Port C是报文入端口,第一leaf交换机101的Port A就是第二leaf交换机的路径端口。Obtain the ID information and related path port information of the second leaf switch; in this embodiment, the path port of the second leaf switch refers to: the port of the second leaf switch 102 directly connected to the outgoing port of the spine switch packet; spine The switch can obtain this information through static configuration or dynamic protocol (such as LLDP protocol); in Figure 5, Port C of the first spine switch 103 is the incoming port of the packet, and Port A of the first leaf switch 101 is the port of the second leaf switch. path port.
拥塞相关数据还包括:报文的出端口ID入方向的拥塞值。步骤503还包括:The congestion-related data also includes: the congestion value of the outbound port ID of the packet in the inbound direction. Step 503 also includes:
获取报文的出端口ID入方向的拥塞值;spine交换机根据报文出端口ID号,并结合本地拥塞信息计算结果,可以获取到该端口入方向的拥塞信息,该拥塞信息表明了一条从leaf到spine的上行链路的拥塞情况;在本实施例中,该信息可以是该端口入方向的带宽利用率值;在图5中,Port C是该报文在第一spine交换机103上的出端口,该端口入方向的拥塞信息是指第二leaf交换机102的Port A到第一spine交换机103的Port A上行链路的拥塞信息。Obtain the congestion value of the outbound port ID of the packet in the inbound direction; the spine switch can obtain the congestion information of the inbound direction of the port according to the outbound port ID number of the packet and the calculation result of the local congestion information. Congestion of the uplink to the spine; in this embodiment, the information can be the bandwidth utilization value of the port in the inbound direction; in FIG. 5, Port C is the outbound of the packet on the first spine switch 103 port, the congestion information in the inbound direction of the port refers to the congestion information of the uplink from Port A of the second leaf switch 102 to Port A of the first spine switch 103.
在步骤504包括:Step 504 includes:
spine交换机根据拥塞相关数据计算第二leaf交换机到第一leaf交换机的其中一条路径的拥塞值。具体地:spine交换机根据第二leaf交换机的ID信息及相关路径端口信息、报文的出端口ID入方向的拥塞值,选择较大值(表明拥塞最严重的值)作为第二leaf交换机102到第一leaf交换机101的其中一条路径的拥塞值,并作为***到原始报文中的拥塞信息,从而得到包括拥塞信息的中间报文;在本实施例中,该路径由第二leaf交换机的ID信息及相关路径端口和第一leaf交换机的ID信息及相关路径端口唯一确定。The spine switch calculates the congestion value of one of the paths from the second leaf switch to the first leaf switch according to the congestion-related data. Specifically: the spine switch selects a larger value (indicating the most heavily congested value) as the second leaf switch 102 to The congestion value of one of the paths of the first leaf switch 101 is used as the congestion information inserted into the original message, so as to obtain an intermediate message including the congestion information; in this embodiment, the path is determined by the ID of the second leaf switch. The information and related path ports and ID information of the first leaf switch and related path ports are uniquely determined.
更进一步地,spine交换机根据以下拥塞相关信息数据第二leaf交换机到第一leaf交换机的一条端到端路径的拥塞值:第一leaf交换机的ID号,第一leaf交换机的路径端口ID号,第二leaf交换机ID号、第二leaf交换机的路径端口号信息。计算出的该端到端路径由第二leaf交换机的ID信息及相关路径端口和第一leaf交换机的ID信息及相关路径端口唯一确定。Further, the spine switch calculates the congestion value of an end-to-end path from the second leaf switch to the first leaf switch according to the following congestion-related information data: the ID number of the first leaf switch, the path port ID number of the first leaf switch, the first leaf switch Information about the ID number of the second leaf switch and the path port number of the second leaf switch. The calculated end-to-end path is uniquely determined by the ID information and related path ports of the second leaf switch and the ID information and related path ports of the first leaf switch.
以下对第一leaf交换机处理的原理进行举例说明。在图5中,第一spine交换机103从本设备的Port A收到报文,从本设备的Port C发出报文。The principle of the processing of the first leaf switch is illustrated below with an example. In FIG. 5 , the first spine switch 103 receives a packet from Port A of the device, and sends a packet from Port C of the device.
第一spine交换机103根据报文入端口Port A信息,获取到第一leaf交换机的ID信息和第一leaf交换机的路径端口Port A的ID号;The first spine switch 103 obtains the ID information of the first leaf switch and the ID number of the path port Port A of the first leaf switch according to the information of the incoming port Port A of the message;
第一spine交换机103根据报文入端口Port A信息,获取Port A出方向的带宽利用率;The first spine switch 103 obtains the bandwidth utilization in the outbound direction of Port A according to the information of the inbound port Port A of the packet;
第一spine交换机103根据报文出端口Port C信息,获取到第二leaf交换机的ID信息和第二leaf交换机的路径端口Port A的ID号;The first spine switch 103 obtains the ID information of the second leaf switch and the ID number of the path port Port A of the second leaf switch according to the outgoing port Port C information of the packet;
第一spine交换机103根据报文入端口Port C信息,获取Port C入方向的带宽利用率;The first spine switch 103 obtains the bandwidth utilization in the inbound direction of Port C according to the information of the inbound port Port C of the packet;
第一spine交换机103比较Port A出方向的带宽利用率和Port C入方向的带宽利用率,获取路径的拥塞值;The first spine switch 103 compares the bandwidth utilization in the outbound direction of Port A with the bandwidth utilization in the inbound direction of Port C, and obtains the congestion value of the path;
第一spine交换机103把相关拥塞值作为拥塞信息***到原始报文中。具体地,相关拥塞值可以包括:第一leaf交换机的ID号(例如,在图5中是指第一leaf交换机101的ID号);相关拥塞值还可以包括:第一leaf交换机的路径端口号信息(例如,在图5中是指第一leaf交换机101的Port A的ID号);相关拥塞值还可以包括:第二leaf交换机的ID号(例如,在图5中是指第二leaf交换机102的ID号);相关拥塞值还可以包括:第二leaf交换机的路径端口号信息(例如,在图5中是指第二leaf交换机102的Port A的ID号)。The first spine switch 103 inserts the relevant congestion value into the original packet as congestion information. Specifically, the relevant congestion value may include: the ID number of the first leaf switch (for example, the ID number of the first leaf switch 101 in FIG. 5 ); the relevant congestion value may also include: the path port number of the first leaf switch information (for example, referring to the ID number of Port A of the first leaf switch 101 in FIG. 5 ); the relevant congestion value may also include: the ID number of the second leaf switch (for example, referring to the second leaf switch in FIG. 5 ) 102); the relevant congestion value may further include: path port number information of the second leaf switch (for example, in FIG. 5, it refers to the ID number of Port A of the second leaf switch 102).
第二leaf交换机到第一leaf交换机某条路径的拥塞值。该路径由leaf3交换机Port A和leaf1交换机Port A唯一确定。The congestion value of a certain path from the second leaf switch to the first leaf switch. The path is uniquely determined by leaf3 switch Port A and leaf1 switch Port A.
步骤506、第二leaf交换机判断中间报文中是否存在拥塞信息,若判断中间报文中存在拥塞信息,则执行步骤307,否则,执行步骤311;具体地,第二leaf交换机接收到由spine交换机转发的中间报文后判断中间报文中是否存在拥塞信息。Step 506: The second leaf switch determines whether there is congestion information in the intermediate packet. If it is determined that there is congestion information in the intermediate packet, step 307 is performed; otherwise, step 311 is performed; After forwarding the intermediate packets, determine whether there is congestion information in the intermediate packets.
步骤507、第二leaf交换机解析出拥塞信息;具体地,第二leaf交换机解析出拥塞信息后得到第一leaf交换机的ID信息、第一leaf交换机的路径端口信息和相关路径拥塞数据、第二leaf交换机的ID信息、第二leaf交换机的路径端口信息和相关路径拥塞数据。Step 507: The second leaf switch parses out the congestion information; specifically, after the second leaf switch parses out the congestion information, it obtains the ID information of the first leaf switch, the path port information of the first leaf switch and related path congestion data, the second leaf switch ID information of the switch, path port information of the second leaf switch, and related path congestion data.
步骤508、第二leaf交换机计算从第二leaf交换机到第一leaf交换机的端到端的其中一条路径的拥塞信息,并更新到本地拥塞状态表中。Step 508: The second leaf switch calculates the congestion information of one of the end-to-end paths from the second leaf switch to the first leaf switch, and updates it to the local congestion state table.
步骤509、删除拥塞信息、得到原始报文、将原始报文转发给目的服务器。Step 509: Delete the congestion information, obtain the original packet, and forward the original packet to the destination server.
步骤510、目的服务器接收原始报文。Step 510: The destination server receives the original message.
步骤511、第二leaf交换机正常转发中间报文。Step 511: The second leaf switch normally forwards the intermediate packet.
结合图5,步骤506包括:5, step 506 includes:
第二leaf交换机102从本设备的Port A收到中间报文;The second leaf switch 102 receives the intermediate message from Port A of the device;
第二leaf交换机102识别中间报文的信息,判断中间报文中是否存在***的拥塞信息。The second leaf switch 102 identifies the information of the intermediate packet, and determines whether there is congestion information inserted in the intermediate packet.
若步骤506中,判断中间报文中存在***的拥塞信息,则执行步骤507。具体地,步骤507包括:If it is determined in step 506 that there is congestion information inserted in the intermediate packet, step 507 is executed. Specifically, step 507 includes:
根据报文的源IP或第一leaf交换机的ID号信息获取到第一leaf交换机的ID信息;Obtain the ID information of the first leaf switch according to the source IP of the packet or the ID number information of the first leaf switch;
从拥塞信息中获取到第一交换机的路径端口的ID信息;Obtain the ID information of the path port of the first switch from the congestion information;
根据拥塞信息或从本地获取第二leaf交换机的ID信息;Obtain the ID information of the second leaf switch according to the congestion information or locally;
根据拥塞信息或从报文入端口获取到第二交换机的路径端口的ID信息。The ID information of the path port to the second switch is obtained according to the congestion information or from the ingress port of the packet.
步骤508包括:Step 508 includes:
第二leaf交换机102综合第一leaf交换机的ID信息和相关路径端口号信息、第二leaf交换机的ID信息和相关路径端口号信息,最终索引到本地拥塞状态中相关的其中一条路径信息中,本地拥塞状态表的格式如上表1所示;The second leaf switch 102 integrates the ID information of the first leaf switch and the related path port number information, the ID information of the second leaf switch and the related path port number information, and finally indexes into one of the related path information in the local congestion state, the local The format of the congestion state table is shown in Table 1 above;
第二leaf交换机102更新相关的路径信息。The second leaf switch 102 updates the relevant path information.
本申请实施例中,若第一leaf交换机101下的源服务器105向第二leaf交换机102下的目的服务器106发送报文时,有流量沿其他路径转发,如图7所示,第一leaf交换机101的Port C——>第一leaf交换机101的Port B——>第二spine交换机104的Port A——>第一spine交换机103的Port C——>第二leaf交换机102的Port B——>第二leaf交换机102的交换机Port C。重复上述步骤501至步骤510的流程,最终第二leaf 交换机102获得本设备到第一leaf交换机101的端到端路径的拥塞状态,该路径由上行链路第二leaf交换机102的Port B到spine2交换机Port C和下行链路spine2交换机Port A到第一leaf交换机101的Port B确定。In this embodiment of the present application, if the source server 105 under the first leaf switch 101 sends a packet to the destination server 106 under the second leaf switch 102, some traffic is forwarded along other paths, as shown in FIG. 7 , the first leaf switch Port C of 101 -> Port B of the first leaf switch 101 -> Port A of the second spine switch 104 -> Port C of the first spine switch 103 -> Port B of the second leaf switch 102 - >Switch Port C of the second leaf switch 102. Repeat the process from step 501 to step 510, and finally the second leaf switch 102 obtains the congestion status of the end-to-end path from the device to the first leaf switch 101, and the path is from Port B of the uplink second leaf switch 102 to spine2 Port C of the switch and Port A of the downlink spine2 switch to Port B of the first leaf switch 101 are determined.
通过上述过程,第二leaf交换机102获得本设备到第一leaf交换机101所有端到端路径的拥塞状态。Through the above process, the second leaf switch 102 obtains the congestion status of all end-to-end paths from the device to the first leaf switch 101 .
图10是本申请实施例提供的基于收集拥塞信息的方法和确定最优路径的方法的第四个应用场景的流程图,图10中以spine交换机作为拥塞信息计算节点进行举例说明,并且,leaf交换机以本机的每个交换机网络侧端口+每个leaf交换机为维度,配置定时器。图10在图9的基础上,至少新增步骤601至步骤603。FIG. 10 is a flowchart of the fourth application scenario based on the method for collecting congestion information and the method for determining an optimal path provided by an embodiment of the present application. In FIG. 10, a spine switch is used as an example for calculating the congestion information, and leaf The switch is configured with timers based on the network side port of each switch of the local machine + each leaf switch as the dimension. FIG. 10 adds at least steps 601 to 603 on the basis of FIG. 9 .
第四个应用场景的用于负载均衡的最优路径确定方法中,为了防止第一leaf交换机下的服务器节点长时间没有向第二leaf交换机下的服务器节点发送报文,造成第二leaf交换机无法及时更新本节点到第一leaf交换机路径的拥塞情况。在本案例中,基于第三个应用场景,可以在leaf交换机上,以本机交换机每个网络侧端口+其他leaf交换机为维度,配置1个定时器。若定时器超时,第一leaf交换机主动从相关网络侧端口向相关leaf交换机发送拥塞报文。In the method for determining the optimal path for load balancing in the fourth application scenario, in order to prevent the server node under the first leaf switch from sending packets to the server node under the second leaf switch for a long time, the second leaf switch cannot be The congestion status of the path from the current node to the first leaf switch is updated in time. In this case, based on the third application scenario, a timer can be configured on the leaf switch with each network port of the local switch + other leaf switches as the dimension. If the timer expires, the first leaf switch actively sends a congestion packet from the relevant network side port to the relevant leaf switch.
步骤601、给相应leaf交换机配置定时器。Step 601: Configure a timer for the corresponding leaf switch.
步骤601与步骤401类似。具体地,在leaf交换机上,以本机交换机的每个网络侧端口,其他每个leaf交换机这2个参数为维度,配置1个定时器。举例来说,在图5的网络拓扑中,在第一leaf交换机101上,需要配置4个定时器,该定时器与本机网络侧接口与其他leaf交换机的关系,可以参考上表2所示,定时器1对应第一leaf交换机101的Port A,第三leaf交换机107;定时器2对应第一leaf交换机101的Port A,第二leaf交换机102;定时器3对应第一leaf交换机101的Port A,第三leaf交换机107;定时器4对应第一leaf交换机101的Port B,第二leaf交换机102。Step 601 is similar to step 401 . Specifically, on the leaf switch, one timer is configured with two parameters of each network port of the local switch and each other leaf switch as dimensions. For example, in the network topology of FIG. 5, four timers need to be configured on the first leaf switch 101. The relationship between the timers and the local network side interface and other leaf switches can be referred to as shown in Table 2 above. , Timer 1 corresponds to Port A of the first leaf switch 101, and the third leaf switch 107; Timer 2 corresponds to Port A of the first leaf switch 101, and the second leaf switch 102; Timer 3 corresponds to the Port of the first leaf switch 101 A, the third leaf switch 107; Timer 4 corresponds to Port B of the first leaf switch 101, and the second leaf switch 102.
步骤602、将相应定时器进行清零。Step 602: Clear the corresponding timer to zero.
步骤602与步骤402类似。具体地,本Leaf交换机从自身某个网络侧端口向其他leaf交换机发出报文时,则相关定时器进行清零并重新开始计时。比如在表2中,第一leaf交换机101从自己的Port A向第二leaf交换机102发出报文,则对定时器2进行清零并重新开始计时。Step 602 is similar to step 402 . Specifically, when the leaf switch sends a packet from a certain network port of itself to other leaf switches, the related timer is cleared and the timing is restarted. For example, in Table 2, if the first leaf switch 101 sends a packet from its own Port A to the second leaf switch 102, the timer 2 is cleared and the timing is restarted.
步骤603、若定时器超时,本leaf交换机向相应的其他leaf交换发送第二数据报文。Step 603: If the timer times out, the leaf switch sends a second data packet to the corresponding other leaf switches.
步骤603与步骤403类似,不同于步骤403的是,步骤403所发送的是第一数据报文,步骤603发送的是第二数据报文。具体地,若定时器超时,表明本leaf交换机已经在时间T内没有从相关网络侧端口向其他相关leaf交换机发送报文;此时本leaf交换机需要主动通过定时器相关的网络侧端口向定时器相关的leaf交换机发送第二数据报文。在一实施例中,第二数据报文的目的IP可以是相关leaf交换机自身的IP地址。与上述第二个应用场景所不同的是,第一数据报文至少包括定时器相关端口入方向的拥塞信息,而第二数据报文不包括拥塞信息;第二数据报文的其他内容可以自己定义,但不包括拥塞信息。Step 603 is similar to step 403, except that what is different from step 403 is that what is sent in step 403 is the first data packet, and what is sent in step 603 is the second data packet. Specifically, if the timer times out, it indicates that the leaf switch has not sent packets from the relevant network side ports to other relevant leaf switches within the time T; at this time, the leaf switch needs to actively send packets to the timer through the network side ports related to the timer. The relevant leaf switch sends the second data packet. In an embodiment, the destination IP of the second data packet may be the IP address of the relevant leaf switch itself. The difference from the second application scenario above is that the first data packet includes at least the congestion information in the inbound direction of the timer-related port, while the second data packet does not include congestion information; other content of the second data packet can be defined, but does not include congestion information.
示例地,结合表2,在一实施例中,以表2所示的内容配置定时器,其中,若定时器2超时,因定时器2与第二leaf交换机102,和自身的网络侧接口Port A相关,所以第一leaf交换机101主动从本设备的Port A向第二leaf交换机102发送第二数据报文。该第二数据报文的目的IP地址是第二leaf交换机102的自身IP,第二数据报文中不包括拥塞信息。Exemplarily, in conjunction with Table 2, in an embodiment, the timer is configured with the content shown in Table 2, wherein, if the timer 2 times out, because the timer 2 and the second leaf switch 102, and its own network side interface Port A is related, so the first leaf switch 101 actively sends the second data packet from Port A of the device to the second leaf switch 102. The destination IP address of the second data packet is the own IP of the second leaf switch 102, and the second data packet does not include congestion information.
图11是本申请实施例提供的基于收集拥塞信息的方法和确定最优路径的方法的第五个应用场景的新增流程图,图11中,以leaf交换机计算拥塞信息、leaf交换机配置定时器、定时向报文中***拥塞信息进行举例说明。图11中的方法至少新增步骤701至步骤703。FIG. 11 is a new flowchart of the fifth application scenario based on the method for collecting congestion information and the method for determining an optimal path provided by the embodiment of the present application. In FIG. 11 , the congestion information is calculated by the leaf switch and the timer is configured on the leaf switch. , and inserting congestion information into the packet regularly for illustration. The method in FIG. 11 adds at least steps 701 to 703 .
在第一个应用场景中,leaf交换机计算拥塞信息,当收到报文时,只要判断出端口是网络侧端口,就需要在报文中***拥塞信息。该方法相当于增加了数据报文的长度,拥塞信息消耗了一定的网络带宽。为了减少拥塞信息占用的网络带宽,在第五个应用场景中,基于第一个应用场景,在leaf交换机上,以本机交换机每个网络侧端口+其他leaf交换为维度,配置1个定时器。当定时器超时,第一leaf交换机才将拥塞信息***到报文中。In the first application scenario, the leaf switch calculates the congestion information. When receiving a packet, as long as it determines that the port is a network-side port, it needs to insert the congestion information into the packet. This method is equivalent to increasing the length of the data packet, and the congestion information consumes a certain network bandwidth. In order to reduce the network bandwidth occupied by the congestion information, in the fifth application scenario, based on the first application scenario, on the leaf switch, a timer is configured with each network side port of the local switch + other leaf switches as the dimension . When the timer expires, the first leaf switch inserts the congestion information into the packet.
步骤701、给相应leaf交换机配置定时器。Step 701: Configure a timer for the corresponding leaf switch.
步骤701与步骤401类似。具体地,在leaf交换机上,以本机交换机的每个网络侧端口,其他每个leaf交换机这2个参数为维度,配置1个定时器。举例来说,在图5的网络拓扑中,在第一leaf交换机101上,需要配置4个定时器,该定时器与本机网络侧接口与其他leaf交换机的关系,可以参考上表2所示,定时器1对应第一leaf交换机101的Port A,第三leaf交换机107;定时器2对应第一leaf交换机101的Port A,第二leaf交换机102;定时器3对应第一leaf交换机101的Port A,第三leaf交换机107;定时器4对应第一leaf交换机101的Port B,第二leaf交换机102。Step 701 is similar to step 401 . Specifically, on the leaf switch, one timer is configured with two parameters of each network port of the local switch and each other leaf switch as dimensions. For example, in the network topology of FIG. 5, four timers need to be configured on the first leaf switch 101. The relationship between the timers and the local network side interface and other leaf switches can be referred to as shown in Table 2 above. , Timer 1 corresponds to Port A of the first leaf switch 101, and the third leaf switch 107; Timer 2 corresponds to Port A of the first leaf switch 101, and the second leaf switch 102; Timer 3 corresponds to the Port of the first leaf switch 101 A, the third leaf switch 107; Timer 4 corresponds to Port B of the first leaf switch 101, and the second leaf switch 102.
步骤702、若定时器到达预设时间,则停止计时。Step 702: If the timer reaches the preset time, stop timing.
步骤703、第一leaf交换机确定报文需要从网络侧端口发出。Step 703: The first leaf switch determines that the packet needs to be sent from the network side port.
在步骤703中,具体地,In step 703, specifically,
第一leaf交换机根据目的IP找到对应的第二leaf交换机ID;The first leaf switch finds the corresponding second leaf switch ID according to the destination IP;
第一leaf交换机根据网络侧端口ID和第二leaf交换机ID,获得对应的定时器信息;若定时器超时,则:The first leaf switch obtains the corresponding timer information according to the network side port ID and the second leaf switch ID; if the timer times out, then:
第一leaf交换机需要将拥塞信息***到原始报文中;The first leaf switch needs to insert congestion information into the original packet;
该定时器重新开始计时。The timer restarts.
通过上述所记载的脊叶网络中收集拥塞信息的方法和确定最优路径的方法中,所有的leaf交换机最终将获取到本节点到其他leaf交换机端到端相关路径的拥塞信息,并计算出到其他leaf交换机的最小拥塞路径。第一leaf交换机向第二leaf交换机发送报文时,第一leaf交换机可以基于flowlet流选择拥塞最小的路径进行转发。在本申请实施例中,只有拥塞信息计算节点(leaf交换机或spine交换机)才需要启用拥塞信息计算功能,该功能不需要全网部署,降低了部署成本和难度;整个报文的传输过程,leaf交换机仅仅需要在原始报文内***相关拥塞信息,相比采用带外方式构造拥塞报文的方案,信息开销较小;整个拥塞信息的计算、处理、解析等操作都在交换机上进行,相比SDN方案,拥塞信息更新更快,流量负载均衡的效果更明显。Through the method for collecting congestion information and the method for determining the optimal path in the spine-and-leaf network described above, all leaf switches will finally obtain the congestion information of the end-to-end related paths from the node to other leaf switches, and calculate the number of The least congested path for other leaf switches. When the first leaf switch sends the packet to the second leaf switch, the first leaf switch may select a path with the least congestion for forwarding based on the flowlet flow. In the embodiment of the present application, only the congestion information calculation node (leaf switch or spine switch) needs to enable the congestion information calculation function, and this function does not need to be deployed on the entire network, which reduces the deployment cost and difficulty; the entire packet transmission process, leaf The switch only needs to insert the relevant congestion information into the original packet. Compared with the scheme of constructing the congestion packet in an out-of-band way, the information overhead is smaller; the calculation, processing, and analysis of the entire congestion information are all performed on the switch. With the SDN solution, the congestion information is updated faster, and the effect of traffic load balancing is more obvious.
本申请实施例还提供一种脊叶网络中收集拥塞信息的装置,可以实现上述脊叶网络中收集拥塞信息的方法,该装置包括:The embodiment of the present application also provides a device for collecting congestion information in a spine-and-leaf network, which can implement the above-mentioned method for collecting congestion information in a spine-and-leaf network, and the device includes:
网络侧端口确定模块,被设置成确定网络侧端口;a network-side port determination module, configured to determine a network-side port;
第一拥塞信息获取模块,被设置成获取网络侧端口相关的拥塞信息;a first congestion information acquisition module, configured to acquire congestion information related to a network side port;
第一路径端口确定模块,被设置成根据配置策略确定第一leaf交换机的路径端口;The first path port determining module is configured to determine the path port of the first leaf switch according to the configuration policy;
第一***模块,被设置成根据路径端口将拥塞信息***到原始报文中,得到中间报文;The first inserting module is configured to insert the congestion information into the original message according to the path port to obtain the intermediate message;
第一转发模块,被设置成将中间报文发出。The first forwarding module is configured to send the intermediate message.
本申请实施例还提供另一种脊叶网络中收集拥塞信息的装置,可以实现上述脊叶网络中收集拥塞信息的方法,该装置包括:The embodiment of the present application also provides another device for collecting congestion information in a spine-and-leaf network, which can implement the above-mentioned method for collecting congestion information in a spine-and-leaf network, and the device includes:
报文获取模块,被设置成从网络侧端口获取第一leaf交换机发出的原始报文;The message obtaining module is configured to obtain the original message sent by the first leaf switch from the network side port;
第二拥塞信息获取模块,被设置成获取网络侧端口相关的拥塞信息;The second congestion information acquisition module is configured to acquire congestion information related to the network side port;
第二路径端口确定模块,被设置成确定第一leaf交换机的路径端口;a second path port determining module, configured to determine the path port of the first leaf switch;
第二***模块,被设置成根据路径端口将拥塞信息***到原始报文中,得到中间报文;The second inserting module is configured to insert the congestion information into the original message according to the path port to obtain the intermediate message;
第二转发模块,被设置成将中间报文发送给第二leaf交换机。The second forwarding module is configured to send the intermediate packet to the second leaf switch.
本申请实施例还提供另一种脊叶网络中确定最优路径的装置,可以实现上述脊叶网络 中确定最优路径的方法,该装置包括:The embodiment of the present application also provides a device for determining an optimal path in another spine-and-leaf network, which can realize the method for determining an optimal path in the above-mentioned spine-and-leaf network, and the device includes:
报文接收模块,被设置成通过spine交换机接收由第一leaf交换机发出的中间报文;The message receiving module is configured to receive the intermediate message sent by the first leaf switch through the spine switch;
拥塞信息确定模块,被设置成确定中间报文中存在拥塞信息;a congestion information determination module, configured to determine that there is congestion information in the intermediate message;
解析模块,被设置成从中间报文中解析出所述拥塞信息;a parsing module, configured to parse out the congestion information from the intermediate message;
计算模块,被设置成根据拥塞信息计算出最小拥塞路径,将最小拥塞路径确定为最优路径。The calculation module is configured to calculate the minimum congested path according to the congestion information, and determine the minimum congested path as the optimal path.
本申请实施例还提供了一种脊叶网络的网络交换机,包括:The embodiment of the present application also provides a network switch of a spine-and-leaf network, including:
至少一个存储器;at least one memory;
至少一个处理器;at least one processor;
至少一个程序;at least one program;
所述程序被存储在存储器中,处理器执行所述至少一个程序以实现本申请实施上述的脊叶网络中收集拥塞信息的方法或者脊叶网络中确定最优路径的方法。该网络交换机可以为leaf交换机,也可以为spine交换机。The program is stored in the memory, and the processor executes the at least one program to implement the above-mentioned method for collecting congestion information in a spine-and-leaf network or the method for determining an optimal path in a spine-and-leaf network. The network switch can be a leaf switch or a spine switch.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可执行指令执行上述脊叶网络中收集拥塞信息的方法或者脊叶网络中确定最优路径的方法。Embodiments of the present application further provide a computer-readable storage medium, where the computer-executable instructions execute the above-mentioned method for collecting congestion information in a spine-and-leaf network or a method for determining an optimal path in a spine-and-leaf network.
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, the memory can be used to store non-transitory software programs and non-transitory computer-executable programs. Additionally, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some implementations, the memory may include memory located remotely from the processor, which may be connected to the processor through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
本申请实施例提出的脊叶网络中收集拥塞信息的方法、确定最优路径的方法、网络交换机和计算机可读存储介质,通过收集拥塞信息的方法,包括:确定网络侧端口;获取所述网络侧端口相关的拥塞信息;根据配置策略确定第一leaf交换机的路径端口;根据所述路径端口将所述拥塞信息***到原始报文中,得到中间报文;将所述中间报文发出。本实施例提供的技术方案中,可以收集路径拥塞信息,并把拥塞信息***到原始报文中,第二leaf交换机可以解析出拥塞信息,并跟就拥塞信息计算出最优路径,leaf交换机可以按照最优路径转发,以提升整个网络的吞吐率。The method for collecting congestion information in a spine-and-leaf network, a method for determining an optimal path, a network switch, and a computer-readable storage medium proposed by the embodiments of the present application, the method for collecting congestion information includes: determining a network side port; acquiring the network Congestion information related to the side port; determine the path port of the first leaf switch according to the configuration policy; insert the congestion information into the original packet according to the path port to obtain an intermediate packet; send the intermediate packet. In the technical solution provided in this embodiment, path congestion information can be collected, and the congestion information can be inserted into the original packet. The second leaf switch can parse out the congestion information and calculate the optimal path based on the congestion information. The leaf switch can Forwarding according to the optimal path to improve the throughput of the entire network.
本申请实施例描述的实施例是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域技术人员可知,随着技术的演变和新应用场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments described in the embodiments of the present application are for the purpose of illustrating the technical solutions of the embodiments of the present application more clearly, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application. With the emergence of application scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本领域技术人员可以理解的是,图1-11中示出的技术方案并不构成对本申请实施例的限定,可以包括比图示更多或更少的步骤,或者组合某些步骤,或者不同的步骤。It can be understood by those skilled in the art that the technical solutions shown in FIGS. 1-11 do not constitute limitations to the embodiments of the present application, and may include more or less steps than those shown in the drawings, or combine certain steps, or different A step of.
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The apparatus embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、***、设备中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, functional modules/units in the systems, and devices can be implemented as software, firmware, hardware, and appropriate combinations thereof.
本申请的说明书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、***、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description of the present application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that, in this application, "at least one (item)" refers to one or more, and "a plurality" refers to two or more. "And/or" is used to describe the relationship between related objects, indicating that there can be three kinds of relationships, for example, "A and/or B" can mean: only A, only B, and both A and B exist , where A and B can be singular or plural. The character "/" generally indicates that the associated objects are an "or" relationship. "At least one item(s) below" or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (a) of a, b or c, can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c" ", where a, b, c can be single or multiple.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对本领域一些情形做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括多指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or a part that contributes to some situations in the art, or all or part of the technical solution, and the computer software product is stored in a storage medium , including multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM for short), Random Access Memory (RAM for short), magnetic disk or CD, etc. that can store programs medium.
以上参照附图说明了本申请实施例的一些实施例,并非因此局限本申请实施例的权利范围。本领域技术人员不脱离本申请实施例的范围和实质内所作的任何修改、等同替换和改进,均应在本申请实施例的权利范围之内。Some embodiments of the embodiments of the present application have been described above with reference to the accompanying drawings, which are not intended to limit the scope of rights of the embodiments of the present application. Any modifications, equivalent replacements and improvements made by those skilled in the art without departing from the scope and essence of the embodiments of the present application shall fall within the scope of the rights of the embodiments of the present application.

Claims (10)

  1. 一种脊叶网络中收集拥塞信息的方法,包括:A method for collecting congestion information in a spine-and-leaf network, comprising:
    确定网络侧端口;Determine the network side port;
    获取所述网络侧端口相关的拥塞信息;obtaining the congestion information related to the network side port;
    根据配置策略确定第一leaf交换机的路径端口;Determine the path port of the first leaf switch according to the configuration policy;
    根据所述路径端口将所述拥塞信息***到原始报文中,得到中间报文;inserting the congestion information into the original packet according to the path port to obtain an intermediate packet;
    将所述中间报文发出。Send the intermediate message.
  2. 根据权利要求1所述的方法,还包括:The method of claim 1, further comprising:
    为相应的leaf交换机配置定时器;Configure the timer for the corresponding leaf switch;
    若从所述网络侧端口向第二leaf交换机发送报文,则将所述定时器清零。If a packet is sent from the network side port to the second leaf switch, the timer is cleared.
  3. 根据权利要求2所述的方法,其中,还包括:The method of claim 2, further comprising:
    若所述定时器超时,则从对应所述定时器的网络侧端口向对应所述定时器的第二leaf交换机发送第一数据报文;其中,所述第一数据报文包括所述拥塞信息。If the timer times out, send a first data packet from the network port corresponding to the timer to the second leaf switch corresponding to the timer; wherein the first data packet includes the congestion information .
  4. 根据权利要求1所述的方法,还包括:The method of claim 1, further comprising:
    为相应的leaf交换机配置定时器;Configure the timer for the corresponding leaf switch;
    确定所述原始报文从所述网络侧端口发出;determine that the original message is sent from the network side port;
    确定所述第二leaf交换机的ID信息;determining the ID information of the second leaf switch;
    若定时器超时,则根据所述网络侧端口的ID信息和所述第二leaf交换机的ID信息将所述拥塞信息***到所述原始报文中,得到所述中间报文。If the timer times out, the congestion information is inserted into the original packet according to the ID information of the network side port and the ID information of the second leaf switch to obtain the intermediate packet.
  5. 根据权利要求1至4任意一项所述的方法,其中,所述根据配置策略确定第一leaf交换机的路径端口,包括:The method according to any one of claims 1 to 4, wherein the determining the path port of the first leaf switch according to the configuration policy comprises:
    将所述原始报文的出端口或者将所述第一leaf交换机的网络侧端口确定为所述第一leaf交换机的路径端口;determining the outgoing port of the original message or the network side port of the first leaf switch as the path port of the first leaf switch;
    或者,or,
    对所述第一leaf交换机的所有网络侧端口进行轮询方式选择其中一个网络侧端口作为所述第一leaf交换机的路径端口。A polling method is performed on all network side ports of the first leaf switch to select one of the network side ports as the path port of the first leaf switch.
  6. 一种脊叶网络中确定最优路径的方法,包括:A method for determining an optimal path in a spine-and-leaf network, comprising:
    通过spine交换机接收由第一leaf交换机发出的中间报文;Receive the intermediate packet sent by the first leaf switch through the spine switch;
    确定所述中间报文中存在拥塞信息;determining that there is congestion information in the intermediate packet;
    从所述中间报文中解析出所述拥塞信息;Parse the congestion information from the intermediate message;
    根据所述拥塞信息计算出最小拥塞路径,将所述最小拥塞路径确定为最优路径。The minimum congested path is calculated according to the congestion information, and the minimum congested path is determined as the optimal path.
  7. 根据权利要求6所述的方法,还包括:The method of claim 6, further comprising:
    为相应的leaf交换机配置定时器;Configure the timer for the corresponding leaf switch;
    若所述定时器超时,则从对应所述定时器的网络侧端口向对应所述定时器的第二leaf交换机发送第二数据报文;其中,所述第二数据报文不包括所述拥塞信息。If the timer times out, send a second data packet from the network port corresponding to the timer to the second leaf switch corresponding to the timer; wherein the second data packet does not include the congestion information.
  8. 一种脊叶网络中收集拥塞信息的方法,包括:A method for collecting congestion information in a spine-and-leaf network, comprising:
    从网络侧端口获取第一leaf交换机发出的原始报文;Obtain the original packet sent by the first leaf switch from the network side port;
    获取所述网络侧端口相关的拥塞信息;obtaining the congestion information related to the network side port;
    确定所述第一leaf交换机的路径端口;determining the path port of the first leaf switch;
    根据所述路径端口将所述拥塞信息***到原始报文中,得到中间报文;inserting the congestion information into the original packet according to the path port to obtain an intermediate packet;
    将所述中间报文发送给第二leaf交换机。Send the intermediate packet to the second leaf switch.
  9. 一种脊叶网络的网络交换机,包括:A network switch of a spine-and-leaf network, comprising:
    至少一个存储器;at least one memory;
    至少一个处理器;at least one processor;
    至少一个程序;at least one program;
    所述程序被存储在存储器中,处理器执行所述至少一个程序以实现:The program is stored in the memory, and the processor executes the at least one program to achieve:
    如权利要求1至5任一项所述的方法;The method of any one of claims 1 to 5;
    或者,or,
    如权利要求6至7任一项所述的方法。A method as claimed in any one of claims 6 to 7.
  10. 一种计算机可读存储介质,存储有计算机可执行指令,其中,所述计算机可执行指令用于使计算机执行:A computer-readable storage medium storing computer-executable instructions, wherein the computer-executable instructions are used to cause a computer to execute:
    如权利要求1至5任一项所述的方法;The method of any one of claims 1 to 5;
    或者,or,
    如权利要求6至7所述的方法;The method of claims 6 to 7;
    或者,or,
    如权利要求8所述的方法。The method of claim 8.
PCT/CN2021/113568 2020-10-12 2021-08-19 Congestion information collection method, optimal path determination method, and network switch WO2022078063A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011083906.6 2020-10-12
CN202011083906.6A CN112787925B (en) 2020-10-12 2020-10-12 Congestion information collection method, optimal path determination method and network switch

Publications (1)

Publication Number Publication Date
WO2022078063A1 true WO2022078063A1 (en) 2022-04-21

Family

ID=75750468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/113568 WO2022078063A1 (en) 2020-10-12 2021-08-19 Congestion information collection method, optimal path determination method, and network switch

Country Status (2)

Country Link
CN (1) CN112787925B (en)
WO (1) WO2022078063A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112787925B (en) * 2020-10-12 2022-07-19 中兴通讯股份有限公司 Congestion information collection method, optimal path determination method and network switch
CN115348210A (en) * 2022-06-21 2022-11-15 深圳市高德信通信股份有限公司 Delay optimization method based on edge calculation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150146526A1 (en) * 2013-11-27 2015-05-28 Cisco Technologies Inc. Network congestion management using flow rebalancing
CN106470116A (en) * 2015-08-20 2017-03-01 ***通信集团公司 A kind of Network Fault Detection and restoration methods and device
CN107634912A (en) * 2016-07-19 2018-01-26 华为技术有限公司 Load-balancing method, device and equipment
CN109691037A (en) * 2016-09-12 2019-04-26 华为技术有限公司 Method and system for data center's load balancing
CN110351286A (en) * 2019-07-17 2019-10-18 东北大学 Link flood attack detects response mechanism in a kind of software defined network
CN112787925A (en) * 2020-10-12 2021-05-11 中兴通讯股份有限公司 Congestion information collection method, optimal path determination method and network switch

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9246818B2 (en) * 2013-06-24 2016-01-26 Cisco Technology, Inc. Congestion notification in leaf and spine networks
US20170118108A1 (en) * 2015-10-27 2017-04-27 Futurewei Technologies, Inc. Real Time Priority Selection Engine for Improved Burst Tolerance
CN106911584B (en) * 2015-12-23 2020-04-14 华为技术有限公司 Flow load sharing method, device and system based on leaf-ridge topological structure
CN106998302B (en) * 2016-01-26 2020-04-14 华为技术有限公司 Service flow distribution method and device
US10454830B2 (en) * 2016-05-05 2019-10-22 City University Of Hong Kong System and method for load balancing in a data network
CN108234320B (en) * 2016-12-14 2021-07-09 华为技术有限公司 Message transmission method and switch
CN108243111B (en) * 2016-12-27 2021-08-27 华为技术有限公司 Method and device for determining transmission path
CN108667739B (en) * 2017-03-27 2020-12-08 华为技术有限公司 Congestion control method, device and system
US10454839B1 (en) * 2018-05-15 2019-10-22 Cisco Technology, Inc. Deadlock avoidance in leaf-spine networks
CN111224888A (en) * 2018-11-27 2020-06-02 华为技术有限公司 Method for sending message and message forwarding equipment
CN109802879B (en) * 2019-01-31 2021-05-28 新华三技术有限公司 Data stream routing method and device
CN111225031B (en) * 2019-12-17 2020-12-18 长沙星融元数据技术有限公司 Cloud data center virtual bottom layer network architecture and data transmission method thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150146526A1 (en) * 2013-11-27 2015-05-28 Cisco Technologies Inc. Network congestion management using flow rebalancing
CN106470116A (en) * 2015-08-20 2017-03-01 ***通信集团公司 A kind of Network Fault Detection and restoration methods and device
CN107634912A (en) * 2016-07-19 2018-01-26 华为技术有限公司 Load-balancing method, device and equipment
CN109691037A (en) * 2016-09-12 2019-04-26 华为技术有限公司 Method and system for data center's load balancing
CN110351286A (en) * 2019-07-17 2019-10-18 东北大学 Link flood attack detects response mechanism in a kind of software defined network
CN112787925A (en) * 2020-10-12 2021-05-11 中兴通讯股份有限公司 Congestion information collection method, optimal path determination method and network switch

Also Published As

Publication number Publication date
CN112787925B (en) 2022-07-19
CN112787925A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
US10673741B2 (en) Control device discovery in networks having separate control and forwarding devices
US8427958B2 (en) Dynamic latency-based rerouting
US10491519B2 (en) Routing method, device, and system
US8259585B1 (en) Dynamic link load balancing
US10715446B2 (en) Methods and systems for data center load balancing
EP2445145B1 (en) Control element, forwarding element and routing method for internet protocol network
US10075371B2 (en) Communication system, control apparatus, packet handling operation setting method, and program
US10728154B2 (en) Flow table processing method and apparatus
WO2022078063A1 (en) Congestion information collection method, optimal path determination method, and network switch
US10361954B2 (en) Method and apparatus for processing modified packet
EP2506506B1 (en) Method, apparatus and system for controlling network traffic switch
WO2020052306A1 (en) Method, device and system for determining message forwarding path
US20160301571A1 (en) Method and Device for Monitoring OAM Performance
CN106656857B (en) Message speed limiting method and device
US10243857B1 (en) Method and apparatus for multipath group updates
WO2021136430A1 (en) Path selection method and apparatus, computer device, and computer readable medium
WO2021227561A1 (en) Communication method and apparatus
CN116325708A (en) Data processing and transmitting method and related equipment
CN115037667A (en) Fine-grained network situation awareness and source routing intelligent optimization method and device
CN115915098A (en) Method for realizing BMCA in 5G and TSN cross-domain PTP
WO2015062484A1 (en) Calculating shortest path first tree
EP3905613A1 (en) Method for sending and obtaining assert message and network node
CN116074236A (en) Message forwarding method and device
CN112468391B (en) Network fault delivery method and related product
WO2017000097A1 (en) Data forwarding method, device, and system

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 05/09/2023)