WO2024119513A1 - Device and method for agent for dynamically adapting explicit congestion notification configuration in network system - Google Patents

Device and method for agent for dynamically adapting explicit congestion notification configuration in network system Download PDF

Info

Publication number
WO2024119513A1
WO2024119513A1 PCT/CN2022/138121 CN2022138121W WO2024119513A1 WO 2024119513 A1 WO2024119513 A1 WO 2024119513A1 CN 2022138121 W CN2022138121 W CN 2022138121W WO 2024119513 A1 WO2024119513 A1 WO 2024119513A1
Authority
WO
WIPO (PCT)
Prior art keywords
agent
switch port
neighbor
property
message
Prior art date
Application number
PCT/CN2022/138121
Other languages
French (fr)
Inventor
Guillermo Bernardez GIL
Jose SUAREZ-VARELA
Albert CABELLOS-APARICIO
Pere BARLET-ROS
Xiangle CHENG
Xiang Shi
Shihan XIAO
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/CN2022/138121 priority Critical patent/WO2024119513A1/en
Publication of WO2024119513A1 publication Critical patent/WO2024119513A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/046Network management architectures or arrangements comprising network management agents or mobile agents therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0813Configuration setting characterised by the conditions triggering a change of settings
    • H04L41/082Configuration setting characterised by the conditions triggering a change of settings the condition being updates or upgrades of network functionality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • H04L43/0864Round trip delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0894Packet rate

Definitions

  • the present disclosure relates to an agent for dynamically adapting an Explicit Congestion Notification (ECN) configuration in a network system, the network system comprising a plurality of agents, and each agent being associated with a switch port.
  • ECN Explicit Congestion Notification
  • the disclosure further provides a corresponding method and a computer program to perform the method.
  • CC Congestion Control plays a fundamental role in optimizing traffic in Data Center Networks (DCNs) .
  • DCNs implement two main CC protocols: Data Center Transmission Control Protocol (DCTCP) and Data Center Quantized Congestion Notification (DCQCN) .
  • DCTCP Data Center Transmission Control Protocol
  • DCQCN Data Center Quantized Congestion Notification
  • Both protocols, as well as their main variants, are based on ECN, wherein intermediate switches mark packets when they detect congestion.
  • ECN configuration is thus a crucial aspect on the performance of widely deployed CC protocols.
  • CC has been extensively studied. As a result, there exists a number of solutions for DCNs tackling the problem from different angles, such as round-trip time (RTT) -based, credit-based, or telemetry-based mechanisms.
  • RTT round-trip time
  • DCTCP main well-established CC protocols
  • DCQCN DCQCN
  • HPCC High Precision Congestion Control
  • Swift Swift
  • HPC High Precision Congestion Control
  • these solutions require features that are not widely supported by legacy datacenter equipment, such as in-band network telemetry or accurate real-time RTT measurements.
  • CC mechanisms show a very good performance in DCNs. However, they rely on novel network architectures and/or protocol stacks that are not supported by most legacy switches deployed in current DCNs.
  • the most widely deployed CC standards are DCTCP in networks running the TCP/IP stack, and DCQCN in RDMA-based networks. Both of them, nevertheless, rely on static ECN-based mechanisms that do not adapt the marking ECN threshold depending on the observed traffic.
  • an objective of this disclosure is to improve conventional solutions for dynamically adapting ECN configuration in network systems. Another objective is to provide an agent and a corresponding method that can operate successfully under traffic and topology changes. Once it is deployed, an objective is that the agent is enabled to cooperate with neighbor agents to optimize a global property of the network. Further, an objective is to make the agent compatible with any datacenter network running ECN-based protocols, such as DCTCP or DCQCN.
  • an agent for dynamically adapting an ECN configuration in a network system includes a plurality of agents, each agent being associated with a switch port.
  • the agent is configured to: send a first message to one or more neighbor agents, where the first message includes a property of the switch port associated with the agent; receive a second message from each neighbor agent of the one or more neighbor agents, the second message including a property of a neighbor switch port associated with the neighbor agent; and determine the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the neighbor switch ports.
  • This provides the advantage that the agent can independently adapt the ECN configuration of its associated port taking into consideration the internal properties (or attributes) of the switch port and the received communications (second messages) including the properties of neighboring ports.
  • the property of the switch port includes at least one of: a transmission rate, a queue length, and a rate of ECN marked packets of the switch port; and/or the property of the neighbor switch port includes at least one of a transmission rate, a queue length, and a rate of ECN marked packets of the neighbor switch port.
  • the agent before sending the first message to one or more neighbor agents, the agent is configured to collect the property of the switch port associated with the agent.
  • the property of the switch port is commonly supported by commercial switches and can be locally obtained with low computational overhead at microsecond timescales.
  • the agent is further configured to retrieve, from the second message, the property of the neighbor switch port associated with the neighbor agent.
  • the agent is further configured to update the property of the switch port associated with the agent by combining the property of the switch port and the properties of the one or more neighbor switch ports retrieved from the second message.
  • the agent is further configured to generate an updated first message, where the updated first message includes the updated property of the switch port associated with the agent.
  • the agent may be able to further share the property of the switch port associated with it and that already includes information of the properties of the neighbor agents, to the one or more neighbor agents.
  • the agent before the determination of the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the one or more neighbor switch ports, the agent is further configured to perform an iteration procedure including performing a predefined number of iterations.
  • each iteration the following steps are performed: sending the updated first message generated in a previous iteration to the one or more neighbor agents, or in the case of the first iteration send the first message to the one or more neighbor agents, where the updated first message includes the updated property of the switch port associated with the agent; receiving an updated second message from each of the one or more neighbor agents generated in the previous iteration, or in the case of the first iteration receiving the second message from each of the one or more neighbor agents, where the updated second message includes an updated property of the neighbor switch port associated with the neighbor agent; retrieving, from the updated second message, the updated property of the neighbor switch port; updating the property of the switch port associated with the agent by combining the updated property of the switch port calculated in the previous iteration and the updated property of each neighbor switch port retrieved from the one or more second messages; and generating another updated first message including the updated property of the switch port associated with the agent.
  • the property of the switch port associated with the agent is updated so that valuable context to the rest of agents may be provided, facilitating cooperation between the agents and enabling each agent to adapt the ECN configuration of its corresponding switch port accordingly.
  • the predefined number of iterations is a low number, for example 2 or 3 iterations, because a diameter of conventional DCNs is limited. Further, the predefined number of iterations may be a constant value and, thus, there is no need to change the number of iterations when changes in traffic or in topology of the network system occur.
  • the determination of the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the neighbor switch ports includes determining an optimal ECN configuration for the switch port based on the final property of the switch port, where the final property of the switch port includes the updated property of the switch port after the last iteration; alternatively, in the case of the first iteration, determining an optimal ECN configuration for the switch port based on the property of the switch port and the properties of the neighbor switch ports received in the one or more second messages.
  • the determined optimal ECN configuration for the switch port includes an ECN configuration for the switch port that optimizes the performance of the network system.
  • agent may cooperate with the one or more neighbor agents to efficiently adapt the ECN configuration of the switch port associated with it to the fast traffic dynamics of conventional DCNs, and that enables to optimize a global flow-aware goal of the network system
  • the ECN configuration for the switch port that optimizes the performance of the network system includes at least one of an ECN configuration for the switch port that decreases a Flow completion time (FCT) of the network system, an ECN configuration for the switch port that reduces a buffer occupancy of the network system, and an ECN configuration for the switch port that increases a throughput of the network system.
  • FCT Flow completion time
  • the network system is configured based on a DCTCP protocol or a DCQCN protocol.
  • the agent according to this disclosure is compatible with any datacenter network running DCTCP or DCQCN protocols.
  • the functions described may be implemented in hardware, software, firmware, or any combination thereof.
  • a method for an agent for dynamically adapting an ECN configuration in a network system includes a plurality of agents, each agent being associated with a switch port, and the method includes: sending a first message to one or more neighbor agents, the first message including a property of the switch port associated with the agent; receiving a second message from each neighbor agent of the one or more neighbor agents, the second message including a property of a neighbor switch port associated with the neighbor agent; and determining the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the neighbor switch ports.
  • This provides the advantage that the agent can independently adapt the ECN configuration of its associated port taking into consideration the internal properties (or attributes) of the switch port and the received communications (second messages) including the properties of neighboring ports.
  • the property of the switch port includes at least one of: a transmission rate, a queue length, and a rate of ECN marked packets of the switch port; and/or the property of the neighbor switch port includes at least one of a transmission rate, a queue length, and a rate of ECN marked packets of the neighbor switch port.
  • the method before sending the first message to one or more neighbor agents, the method further comprises collecting the property of the switch port associated with the agent.
  • the property of the switch port is commonly supported by commercial switches and can be locally obtained with low computational overhead at microsecond timescales.
  • the method further comprises retrieving, from the second message, the property of the neighbor switch port associated with the neighbor agent.
  • the method further comprises updating the property of the switch port associated with the agent by combining the property of the switch port and the properties of the one or more neighbor switch ports retrieved from the second message.
  • the method further comprises generating an updated first message, where the updated first message includes the updated property of the switch port associated with the agent.
  • the agent may be able to further share the property of the switch port associated with it and that already includes information of the properties of the neighbor agents, to the one or more neighbor agents.
  • the method before determining the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the one or more neighbor switch ports, the method further comprises performing an iteration procedure, including performing a predefined number of iterations.
  • each iteration the following steps are performed: sending the updated first message generated in a previous iteration to the one or more neighbor agents, or in the case of the first iteration send the first message to the one or more neighbor agents, where the updated first message includes the updated property of the switch port associated with the agent; receiving an updated second message from each of the one or more neighbor agents generated in the previous iteration, or in the case of the first iteration receiving the second message from each of the one or more neighbor agents, where the updated second message includes an updated property of the neighbor switch port associated with the neighbor agent; retrieving, from the updated second message, the updated property of the neighbor switch port; updating the property of the switch port associated with the agent by combining the updated property of the switch port calculated in the previous iteration and the updated property of each neighbor switch port retrieved from the one or more second messages; and generating another updated first message including the updated property of the switch port associated with the agent.
  • the property of the switch port associated with the agent is updated so that valuable context to the rest of agents may be provided, facilitating cooperation between the agents and enabling each agent to adapt the ECN configuration of its corresponding switch port accordingly.
  • the predefined number of iterations is a low number, for example 2 or 3 iterations, because a diameter of conventional DCNs is limited. Further, the predefined number of iterations may be a constant value and, thus, there is no need to change the number of iterations when changes in traffic or in topology of the network system occur.
  • determining the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the neighbor switch ports includes determining an optimal ECN configuration for the switch port based on the final property of the switch port, where the final property of the switch port includes the updated property of the switch port after the last iteration; alternatively, in the case of the first iteration, determining an optimal ECN configuration for the switch port based on the property of the switch port and the properties of the neighbor switch ports received in the one or more second messages.
  • the determined optimal ECN configuration for the switch port includes an ECN configuration for the switch port that optimizes the performance of the network system.
  • agent may cooperate with the one or more neighbor agents to efficiently adapt the ECN configuration of the switch port associated with it to the fast traffic dynamics of conventional DCNs, and that enables to optimize a global flow-aware goal of the network system
  • the ECN configuration for the switch port that optimizes the performance of the network system includes at least one of an ECN configuration for the switch port that decreases a FCT of the network system, an ECN configuration for the switch port that reduces a buffer occupancy of the network system, and an ECN configuration for the switch port that increases a throughput of the network system.
  • the network system is configured based on a DCTCP protocol or a DCQCN protocol.
  • the method according to this disclosure is compatible with any datacenter network running DCTCP or DCQCN protocols.
  • a computer program is provided, instructions, which when the program is executed by a computer, cause the computer to perform the method according to the second aspect and its implementation forms.
  • the computer program product according to the third aspect includes the features of the corresponding implementation forms of the method of the second aspect.
  • the solutions according to this disclosure provide a general agent for dynamically adapting the ECN configuration in a network system that achieves the following advantages: the agent is compatible with widely deployed ECN-based CC protocols (e.g., DCTCP, DCQCN) and, thus, may be directly deployed in other network systems with different traffic distributions. Moreover, the agent may determine the ECN configuration of its respective switch port, enabling the switch port to take actions being aware of the property of its neighbor switch ports. This may achieve a global cooperation among the agents through message communications between neighboring switches, as well as a global reward, which in turn may provide with a better optimization potential than conventional solutions based on local rewards.
  • ECN-based CC protocols e.g., DCTCP, DCQCN
  • the agent may be implemented by neural network architectures, that may allow to trained the network system in controlled testbeds and then deploy it directly in real DCNs thereby greatly increasing the commercialization viability of the agent.
  • this disclosure may provide a distributed in-network agent for CC optimization in a network system that may be scalable and robust against possible failures and reconfigurations with respect to centralized solutions, thereby reducing complexity.
  • FIG. 1 shows a schematic diagram of an agent for dynamically adapting an ECN configuration in a network system according to this disclosure
  • FIG. 2 schematically shows an exemplary network system including a plurality of agents, each agent being associated with a switch port, according to this disclosure
  • FIG. 3 shows an exemplary flowchart for dynamically adapting an ECN configuration where the agent is implemented by a message passing neural network, according to this disclosure
  • FIG. 4 an exemplary flowchart for dynamically adapting an ECN configuration where the agent is implemented in an artificial intelligence chip, according to this disclosure
  • FIG. 5 shows a method for an agent for dynamically adapting an ECN configuration in a network system according to this disclosure.
  • FIG. 1 shows a schematic diagram of an agent 100 for dynamically adapting an ECN configuration in a network system 1.
  • the network system 1 comprises a plurality of agents 100, 110 and each agent 100, 110 is associated with a switch port 102, 112.
  • the agent 100 is associated with the switch port 102, and each of the one or more neighbor agents 110 is associated with a switch port 112.
  • the agent 100 is configured to send a first message 104 to one or more neighbor agents 110.
  • the first message 104 comprises a property 101 of the switch port 102 associated with the agent 100.
  • the property 101 of the switch port 102 comprises any port-level metrics available at the switch port 102 that may help to optimize a global property of the network system 1, as it will be explained later in this description.
  • the property 101 of the switch port 102 comprises, for example but not as a limitation, at least one of a transmission rate, a queue length, and a rate of ECN marked packets of the switch port 102.
  • the first message 114 may be generated based on the property of the switch port 102, for example, by a processing block of the agent 100 (optional, thus shown in dashed lines) , which obtains the property 101 of the switch port 102.
  • the agent 100 may comprise some internal attributes based on relevant metrics of its associated port 102 (e.g., utilization, queue length, etc. ) .
  • the agent 100 may be configured to collect the property 101 of its associated switch port 102 available at the switch port 102, for instance, at the optional processing block as indicated.
  • the property 101 of the switch port is commonly supported by commercial switches and can be locally obtained with low computational overhead at microsecond ( ⁇ s) timescales.
  • the agent 100 is configured to receive a second message 114 from each neighbor agent 110 of the one or more neighbor agents 110.
  • the second message 114 comprises a property 103 of a neighbor switch port 112 associated with the neighbor agent 110.
  • the property 103 of the neighbor switch port 112 comprises any port-level metrics available at the neighbour switch port 112 that may help to optimize a global property of the network system 1, for example but not as a limitation, at least one of a transmission rate, a queue length, and a rate of ECN marked packets of the neighbor switch port 112.
  • each neighbor agent 110 may comprise some internal attributes based on relevant metrics of its associated port 112 (e.g., utilization, queue length, etc. ) .
  • the second message 114 may be generated in a similar manner based on the property 103 of the neighbor switch port 112, than the first message 104 is generated at the agent 100 (e.g., by an optional processing block at the neighbor agent 110) .
  • the agent 100 may be further configured to retrieve, from each of the received second messages 114, the property 103 of the neighbor switch port 112 associated with the neighbor agent 110.
  • the agent 100 is configured to determine the ECN configuration 106 for the switch port 102 associated with it, based on the property 101 of the switch port 102 and the properties 103 of the one or more neighbor switch ports 112 received in the plurality of second messages 114 (in FIG. 1, as an example, only one neighbor agent 110 is shown) .
  • each agent 100, 110 independently adapts the ECN configuration of its associated port 102, 112, taking into consideration its internal properties (or attributes) and the received communications (second messages) from neighboring agents 110.
  • the agent 100 may be configured to update the property 101 of the switch port 102 associated with the agent 100 by combining the property 101 of the switch port 102 and the properties 103 of the one or more neighbor switch ports 112 retrieved from the second message 114.
  • Combining the property 101 of the switch port 102 associated with the agent 100 and the properties 103 of the one or more neighbor switch ports 112 retrieved from the second message 114 may comprise, for example, aggregating the properties 103 of each of the one or more neighbor switch ports 112, This can be done through an element-wise operation (e.g. element-wise min or max) .
  • the agent 100 may be configured to generate an updated first message 104.
  • the updated first message 104 may comprise the updated property 101 of the switch port 102 associated with the agent 100.
  • the agent 100 is further configured to perform an iteration procedure.
  • the iteration procedure comprises performing a predefined number of iterations.
  • the predefined number of iterations is a low value, for example 2 or 3 iterations, because a diameter of conventional DCNs is limited.
  • a suitable number of iterations may be equal to three.
  • the predefined number of iterations may be a constant value; hence, there is no need to change the number of iterations when changes in traffic or in topology of the network system 1 occur.
  • Each iteration comprises sending, by the agent 100, the updated first message 104 to the one or more neighbor agents 110, and receiving, by the agent 100, an updated second message 114 from each of the one or more neighbor agents 110.
  • the updated first message 104 comprises the first message 104 and the updated second message 114 comprises the second message 114, as disclosed above.
  • the updated first message 104 comprises the updated property 101 of the switch port 102 associated with the agent 100 and that is generated in a previous iteration
  • the updated second message 114 comprises an updated property 103 of the neighbor switch port 112 associated with the neighbor agent 110 that is generated in a previous iteration.
  • each iteration comprises retrieving, by the agent 100, from the updated second message 114, the updated property 103 of the neighbor switch port 112. Then, each iteration comprises updating, by the agent 100, the property 101 of the switch port 102 associated with the agent 100 by combining the updated property 101 of the switch port 102 calculated in the previous iteration and the updated property 103 of each neighbor switch port 112 retrieved from the one or more second messages 114.
  • each iteration comprises generating, by the agent 100, another updated first message 104 comprising the updated property 101 of the switch port 102 associated with the agent 100.
  • the determination of the ECN configuration 106 for the switch port 102 associated with the agent 100 based on the property of the switch port 102 and the properties 103 of the neighbor switch ports 112 comprises determining, by the agent 100, an optimal ECN configuration for the switch port 102 based on the final property of the switch port 102, where the final property 101 of the switch port 102 comprises the updated property 101 of the switch port 102 after the last iteration.
  • the determination of the ECN configuration 106 for the switch port 102 associated with the agent 100 based on the property of the switch port 102 and the properties 103 of the neighbor switch ports 112 comprises determining an optimal ECN configuration for the switch port 102 based on the property 101 of the switch port 102 and the properties 103 of the neighbor switch ports 112 received in the one or more second messages 114.
  • the property 101 of the switch port 102 is optimized so as to provide with valuable context to the neighbor agents 110, facilitating cooperation between them and enabling each agent 100, 110 to optimally adapt the ECN settings 106 of its respective switch port 102, 112 accordingly. That is, each agent 100, 110 can independently adapt the ECN configuration of its associated port 102, 112, taking into consideration its internal attributes and the properties 103 of the one or more neighbor switch ports 112 received in the communications exchanged with the neighbor agents 110. Those internal attributes may be used to update the properties of each switch port 102, 112, and the agents 100, 110 actually share the updated properties 101, 103 of the respective switch port 102, 112 to their neighbors 110 via messages 104, 114 over the network 1.
  • the determined optimal ECN configuration 106 for the switch port 102 comprises an ECN configuration 106 for the switch port 102 that optimizes the performance of the network system 1.
  • the distributed ECN configuration 106 for the switch port 102 makes the solution according to this disclosure highly scalable to large networks.
  • the ECN configuration 106 for the switch port 102 that optimizes the performance of the network system 1 comprises at least one of an ECN configuration 106 for the switch port 102 that decreases a flow completion time (FCT) of the network system 1, an ECN configuration for the switch port 102 that reduces a buffer occupancy of the network system 1, and an ECN configuration for the switch port 102 that increases a throughput of the network system 1.
  • FCT flow completion time
  • the agent 100 may use any suitable optimization method that enables to optimize a global flow-level metric of the network system 1 (e.g., the FCT, the buffer occupancy and/or the throughput of the network 1) .
  • a global flow-level metric of the network system 1 e.g., the FCT, the buffer occupancy and/or the throughput of the network 1.
  • each of the one or more neighbor agents 110 is configured to perform exactly the same process as described above for the agent 100, thereby all the agents 100, 110 move from a local to a global context awareness of the network 1.
  • the network system 1 is configured based on a DCTCP protocol or a DCQCN protocol. That is, the plurality of agents 100, 110 according to this disclosure are compatible with any datacenter network running DCTCP or DCQCN protocols.
  • the agent 100 may further optimize CC in the network system 1
  • the agent 100 can optimize the handling of congestion notifications within switches 102, 112, which is a crucial component of CC protocols to efficiently optimize traffic in the network system 1.
  • switches mark a Congestion Experienced (CE) bit of packets in case the queue length exceeds some predefined thresholds.
  • CE Congestion Experienced
  • DCTCP protocol adopts a hard cutoff, for example that all packets are marked when the queue length exceeds a certain value k. Instead, the DCQCN protocol implements a softer RED-like probabilistic approach with three configuration parameters ⁇ k min , k max , p max ⁇ given in equation (1) :
  • q len is the instantaneous queue length in the port
  • p mark is the probability for marking a packet
  • the determined ECN configuration 106 for the switch port 102 associated with the agent 100 may comprise, for example and not as a limitation, a value k that optimizes the performance of the network system 1 (i.e., that optimizes at least one of the FCT, the buffer occupancy and the throughput of the network system 1) .
  • the determined ECN configuration 106 for the switch port 102 associated with the agent 100 may comprise, for example and not as a limitation, values of the quantities k min , k max , p max and/or p mark that optimize the performance of the network system 1 (i.e., that optimize at least one of the FCT, the buffer occupancy and the throughput of the network system 1) .
  • the agent 100 may comprise processing circuitry (not shown) configured to perform, conduct or initiate the various operations of the agent 100 described herein.
  • the processing circuitry may comprise hardware and software.
  • the hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry.
  • the digital circuitry may comprise components such as application-specific integrated circuits (ASICs) , field-programmable arrays (FPGAs) , digital signal processors (DSPs) , or multi-purpose processors.
  • the agent 100 may further comprise memory circuitry, which stores one or more instruction (s) that can be executed by the processor or by the processing circuitry, in particular under control of the software.
  • the memory circuitry may comprise a non-transitory storage medium storing executable software code which, when executed by the processor or the processing circuitry, causes the various operations of the agent 100 to be performed.
  • the processing circuitry comprises one or more processors and a non-transitory memory connected to the one or more processors.
  • the non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the agent 100 to perform, conduct or initiate the operations or methods described herein.
  • the agents 100, 110 may enable in-network optimization at intermediate switch ports 102, 112. Further, the agent 100 may provide with a fully decentralized mechanism. The only observable mechanism in the network system 1, thus, may be the message communications between neighboring agents 100, 110. The first message 104 and the second message 114 provide contextual information to the agent 100 and, moreover, a pattern at which first message 104 and the second message 114 are sent may be recognizable.
  • the agents 100, 110 of this exemplary embodiment may reduce the network system 1 complexity by providing with a solution that is more scalable and robust against possible failures and reconfigurations with respect to conventional centralized solutions. Further, as the agent 100 is compatible with widely deployed ECN-based CC protocols (e.g., DCTCP or DCQCN) , the agent 100 may be directly deployed in other network systems 1 with different traffic distributions, increasing the efficiency and the commercialization viability of the agent 100.
  • ECN-based CC protocols e.g., DCTCP or DCQCN
  • the agent 100 may provide with topology-aware local context, as it may enable its associated switch port 102 to take actions (for example, to determine whether a packet may be marked) while being aware of the properties 103 of its neighbor switch ports 112. Thereby, global cooperation is achieved through message communications between the agent 100 and the one or more neighbor agents 110. Additionally, a global reward may be achieved, since the agent 100 may determine the ECN configuration for the switch port 102 that optimizes a goal of the whole network system 1, which may achieve better results than conventional solutions based on local rewards.
  • FIG. 2 schematically depicts an exemplary network system 1 according to this disclosure. Same elements are labelled with the same reference signs.
  • the exemplary network system 1 comprises a plurality of agents 100, 110, exemplary agents A 1 to A 9 , and each agent 100, 110 is associated with a switch port 102, 112.
  • Each agent 100, exemplary agent A 4 may be configured to send a first message 104 to one of more neighbor agents 110, exemplary agents A 2 , A 3 , A 5 , A 7 and A 9 , and to receive a second message from each of the one or more neighbor agents 110 A 2 , A 3 , A 5 , A 7 and A 9 , where the first message 104 comprises a property 101 of the switch port 102 associated with the agent 100 and each of the second messages 114 comprises a property 103 of the neighbor switch port 112 associated with the neighbor agent 110.
  • each port 102, 112 can be an egress port. Further, there can be as many agents 100, 110 as ports in each switch, so that each agent 100, 110 can control the ECN configuration of a particular port.
  • each agent 100, exemplary agent A 4 may also send the first message 104 to other exemplary agents, for examples to agents A 1 , A 6 and/or A 8 , and may also receive the second message from them. This is not limited in this disclosure.
  • the exemplary agents A 1 , A 4 , A 6 and A 8 may form a spine of the network system 1, and the exemplary agents A 2 , A 3 , A 5 , A 7 and A 9 may form a leaf of the network system 1.
  • the exemplary network system 1 of FIG. 2 may be configured based on a DCTCP protocol or a DCQCN protocol.
  • the agent 100 may be implemented in a type of Graph Neural Networks (GNN) called Message Passing Neural Network (MPNN) .
  • GNN Graph Neural Networks
  • MPNN Message Passing Neural Network
  • This MPNN architecture may help the agents 100, 110 to process and model information of the state of the network system 1, and may enable propagating local information over the whole network system 1 by allowing message communications between agents 100, 110, so that they can update their internal attributes depending on the second message 114 received from each neighbor agent 110.
  • GNN Graph Neural Networks
  • MPNN Message Passing Neural Network
  • each port-level agent 100, 110 may eventually learn what information is relevant to exchange with its neighbor agents 110, and may also discover a manner to adapt the ECN configuration 106 of tis corresponding switch port 102 to optimize the global objective (e.g., minimize the FCT, minimize the buffer occupancy, and/or maximize the throughput of the network system 1) .
  • the agents 100, 110 can optimize the property 101, 103 of the respective switch ports 102, 112 jointly.
  • the agent 100 implemented in MPNN can be replicated and deployed across distributed devices of any other DCN.
  • FIG. 3 depicts an exemplary embodiment of a flowchart performed by the agent 100 according to this disclosure. Same elements are labelled with the same reference signs.
  • the agent 100 is implemented in a MPNN
  • each of the plurality of agents 100, 100 i.e., the agent 100 and the one or more neighbor agents 110, may be implemented in a MPNN and may achieve the same processes as disclosed in the following for the agent 100.
  • the first message 104 may comprise the property 101 of the switch port 102 associated with the agent 100 in the form of a hidden state of the agent 100
  • each of the second messages 114 may comprise the property 103 of the neighbor switch port 112 associated with the neighbor agent 110 in the form a hidden state of the neighbor agent 110.
  • the agent 100 may comprise three main modules: an Information Retrieval Module 302, a Hidden State Generation Module 304, and an Action Selection Module 306.
  • the Information Retrieval Module 302 may receive the second message 114 from each of the one or more neighbor agents 110.
  • the Information Retrieval Module 302 can be implemented by a Neural Network (NN) module that may be configured to process the second messages 114 received from the one or more neighbor agents 110, and to further select relevant information of the hidden states of the neighbor agents 110 that should be considered.
  • NN Neural Network
  • the Information Retrieval Module 302 may be further configured to aggregate the selected relevant information of the hidden states of the neighbor agents 110.
  • the aggregation can be done, for example, through an element-wise operation (for example element-wise min or max) .
  • Information Retrieval Module 302 may be configured to output the aggregated information of the hidden states of the neighbor agents 110.
  • the output of the Information Retrieval Module 302 may be used by the Hidden State Generation Module 304.
  • the Hidden State Generation Module 304 can be based on a NN, and can be configured to combine the aggregated selected information of the hidden states of the neighbor agents 110 with the hidden state of the agent 100, thus generating a new or updated hidden state for the agent 100.
  • the Information Retrieval Module 302 and the Hidden State Generation Module 304 may conform a Communication Module 308 that can be identified to the operation of the MPNN. Variants of the MPNN may consider that the functions executed by the Information Retrieval Module 302 and the Hidden State Generation Module 304 may not necessarily be modeled by a NN.
  • the Communication Module 308 additionally or alternatively the Hidden State Generation Module 304, may be configured to generate an updated first message 104 comprising the updated hidden state for the agent 100, and to subsequently send the updated first message 104 to the one or more neighbor agents 110.
  • the Communication Module 308 may be executed for a predefined number of iterations.
  • Each of the one or more neighbor agents 110 may perform exactly the same process, so that all the agents 100, 110 move from a local to a global context awareness.
  • the Action Selection Module 306 of the agent 100 may be configured to determine an optimal ECN configuration for the switch port 102 associated with the agent 100 based on the final hidden state of the agent 100, which takes into account the updated hidden state of each of the one or more neighbor agents 110.
  • the final hidden state of the agent 100 may comprise the updated hidden state of the agent 100 after the last iteration.
  • the Action Selection Module 306 can be implemented by a NN that produces, as an output, a final action policy as depicted in FIG. 3, where the final action policy may comprise the ECN configuration 106 of the switch port 102 associated with the agent 100.
  • the determined optimal ECN configuration for the switch port 102 associated with the agent 100 may comprise an ECN configuration 106 for the switch port 102 that optimizes the performance of the network system 1.
  • the ECN configuration 106 for the switch port 102 that optimizes the performance of the network system 1 comprises at least one of the ECN configuration 106 for the switch port 102 that decreases the FCT of the network system 1, the ECN configuration for the switch port 102 that reduces the buffer occupancy of the network system 1, and the ECN configuration for the switch port 102 that increases the throughput of the network system 1.
  • the hidden state representations of the agents 100, 100 are central elements, since they may be the actual content of message communications.
  • the hidden states of the agents 100, 110 need to encode not only the agent 100, 110 discriminant features required to define its individual behavior, but also relevant information about the context to facilitate the action decision making of neighboring agents 110.
  • the agent 100 may collect some port-level metrics available at its corresponding switch port 102, for example and not as a limitation, at least one of the transmission rate or utilization of the switch port 102, the instantaneous queue length and the rate of ECN-marked packets in the switch port 102. These metrics represent the input features depicted in FIG. 3.
  • the agent 100 may initialize its hidden state based on tis corresponding input features. That is, the input features are encoded into a hidden state of the agent 100, and the hidden state of the agent 100 may be a fixed-size vector.
  • the initialization may be performed, for example, by adding the input features and applying zero-padding to fit a dimension (i.e., the fixed size) of the hidden state vector for the agent 100.
  • the agent 100 may sends the first message 104 to the one or more neighbor agents 110.
  • the first message may comprise the hidden state of the agent 100. Given that each agent 100, 110 represents egress ports 102, 112, the agent 100 may actually send the first message 104 to all ports 112 that can potentially receive traffic from it.
  • the agent 100 may receive the hidden states of all of its neighbor agents 110, and may processes them as follows:
  • the hidden state of each neighbor agent 110 may be combined with the hidden state of the agent 100 through a Message Function 310.
  • the Message Function 310 can be implemented by a NN module, for example by the Information Retrieval Module 302.
  • the combined hidden state of the agent 100 and hidden states of the one of more neighbor agents 110 may be aggregated using a predefined Aggregation Function 312, e.g., an element-wise max or min function.
  • a predefined Aggregation Function 312 e.g., an element-wise max or min function.
  • the agent 100 may update its own hidden state with the aggregated hidden states.
  • This task can be performed by an Update Function 314, which may be implemented by a NN module, for example by Hidden State Generation Module 304.
  • the agent 100 may generate an updated hidden state of the agent 100 that potentially incorporates information of the one or more neighbor agents 110.
  • the message exchange and processing described in the third and fourth steps may be repeated in a predefined number of iterations.
  • the updated hidden states of the one or more neighbor agents 110 received in the one or more second messages 114 may be considered. During these iterations:
  • a range of the network 1 is expanded at successive message passing iterations. At each iteration, the agent 100 may have access to information of more distant points (agents 110) in the network 1.
  • the hidden state of the agent 100 and the hidden states of the neighbor agents 110 may evolve from sparse data to much more dense representations in hidden state vectors as the iterations of the message passing procedure are executed.
  • the number of iterations may be predefined, and typically is a low value, e.g. 2 or 3 iterations, because the diameter of DCNs is limited.
  • the agent 100 may individually evaluate its final hidden state through a Readout Function 316.
  • the Readout Function 316 can be implemented by another NN module, for example the Action Selection Module 306.
  • the Readout Function 316 may provide the agent 100 with its action policy, i.e., with the optimal ECN configuration for the switch port 102 associated with the agent 100.
  • the exemplary implementation of the agent 100 by MPNN may be trained using, for example and not as a limitation, a Reinforcement Learning algorithm.
  • the agent 100 may be trained offline, (e.g., in a controlled testbed) and then can be directly deployed in production networks, without the need for re-training on premises. Hence, it may avoid uncertainty of online training in production networks.
  • offline training may enable to perform extensive testing before deployment, with the possibility of issuing certifications determining the operational ranges that the agent 100 can safely support (e.g., link capacities or maximum network size) . This certification process is aligned with the conventional way in which networking products are currently commercialized.
  • the agent 100 may learn how to communicate with the one or more neighbor agents 110 to dynamically adapt the ECN configuration 106 of its respective switch port 102 that optimizes a global flow-level metric (e.g., FCT) of the network system 1.
  • the message communications may enable a global cooperation between the agents 100, 110, as they can take actions based not only on their local state information, but also on data from their neighbor agents 110.
  • the agent 100 is compatible with any data center network running DCTCP or DCQCN protocols. That is, when the agent 100 and the neighbor agents 110 are implemented by MPNN, the network system 1 may be configured based on a DCTCP protocol or a DCQCN protocol.
  • the determined ECN configuration 106 for the switch port 102 associated with the agent 100 may comprise, for example and not as a limitation, a value k that optimizes the performance of the network system 1 configured based on a DCTCP protocol.
  • the determined ECN configuration 106 for the switch port 102 associated with the agent 100 may comprise values of the quantities k min , k max , p max and/or p mark that optimize the performance of the network system 1 configured based on a DCQCN protocol, as depicted in FIG. 3.
  • ACC is designed for online training and, thus, it needs to be re-trained to gradually learn how to adapt to the current network conditions.
  • online training is not always appropriate in production environments, as it carries an implicit uncertainty on what would be the resulting performance of agents after re-training.
  • this exemplary embodiment for the agent 100 may alleviate the problems and disadvantages of conventional solutions based on ML and NN DCNs.
  • FIG. 4 shows an exemplary embodiment of a flowchart performed by the agent 100 according to this disclosure. Same elements are labelled with the same reference signs.
  • the agent 100 and the neighbor agents 110 may be implemented, for example and not as a limitation, in chips 400, 410 optimized for Artificial Intelligence (AI) applications.
  • the chips 400, 410 may be embedded in switch ports 102, 112 associated with the agents 100, 110.
  • Each chip 400, 410 may comprise a network interface card (NIC) .
  • NIC network interface card
  • the agent 100 may retrieve the property of the switch port 102 associated with it from its local NIC.
  • the property 101 of the switch port 102 may comprise any port-level metrics available at the switch port 102 that may help to optimize a global property of the network system 1, for example and not as a limitation, at least one of a transmission rate of the switch port 102 in the form of bytes transmitted by the NIC, depicted as tx rate in FIG. 4, a queue length, q len , of the switch port 102 and a rate of ECN marked packets, ECN marks of the switch port 102.
  • These properties are commonly supported by commercial switches, and can be locally obtained by the agent 100 with low computational overhead at timescales of the order of microseconds.
  • the agent 100 may start a communication with the one or more neighbor agents 110 deployed in neighbor switches 112 to gain a local context. This communication is done by sending the first message 104 to the one or more neighbor agents 110.
  • the communications may be performed, for example and not as a limitation, through a NN-driven message passing, as disclosed above for the exemplary implementation of the agent 100 by MPNN depicted in FIG. 3, where the agents 100, 110 exchange the first message 104 and the second messages 114 directly encoded by NN modules in order to find the optimal ECN configuration 106 for the switch port 102 in its associated NIC.
  • This communication only requires exchanging a few bytes with neighboring switches 112 and can take a few microseconds ( ⁇ s) .
  • the base link propagation delay in production DCNs is typically of the order of 1 ⁇ s.
  • the agent 100 may directly interface with forwarding chips using a respective Application Programming Interface (API) , which is typically vendor-specific.
  • API Application Programming Interface
  • the agent 100 may be further configured to mark packets according to the determined optimal ECN configuration 106 for the switch port 102.
  • the agent 100 may be configured to act as an end-host.
  • a CC protocol can be executed at the end-host.
  • the end-host may adjust the flow transmission rate of the respective switch port 102, 112 based on ECN feedback. This process may be as follows: if a host receives an ECN-marked packet, the host may notify it to a sender by using an acknowledgement ACK.
  • the protocol-specific algorithm may comprise, for example, an Additive-Increase/Multiplicative-Decrease protocol.
  • This CC mechanism may enable to gradually react at one-RTT timescales that are approximately 10 ⁇ s in high-speed DCNs.
  • the in-network optimization achieve in this disclosure is orthogonal and complementary to the selection of the flow rate control algorithm (e.g., DCTCP or DCQCN) .
  • agent 100 is compatible with any ECN-based CC protocol and can be deployed along with any other well-established traffic optimization techniques, such as flow scheduling.
  • FIG. 5 shows an exemplary method 500 for an agent 100 for dynamically adapting an ECN configuration in a network system 1 according to this disclosure.
  • the network system 1 comprises a plurality of agents 100, 110, each agent 100, 110 being associated with a switch port 102, 112, as disclosed before in this disclosure.
  • the method 500 comprises a step 502 of sending a first message 104 to one or more neighbor agents 110.
  • the first message 104 comprises a property 101 of the switch port 102 associated with the agent 100, and the property of the switch port 102 comprises any port-level metrics available at the switch port 102 that may help to optimize a global property of the network system 1, for example and not as a limitation, at least one of a transmission rate, a queue length, and a rate of ECN marked packets of the switch port 102.
  • the method 500 comprises a step 304 of receiving a second message 114 from each neighbor agent 110 of the one or more neighbor agents 110.
  • the second message 114 comprises a property 103 of a neighbor switch port 112 associated with the neighbor agent 110.
  • the property 103 of the neighbor switch port 112 comprises any port-level metrics available at the neighbour switch port 112 that may help to optimize a global property of the network system 1, for example and not as a limitation, at least one of a transmission rate, a queue length, and a rate of ECN marked packets of the neighbor switch port 112.
  • the method 500 further comprises a step 306 of determining the ECN configuration 106 for the switch port 102 associated with the agent 100 based on the property 101 of the switch port 102 and the properties 103 of the neighbor switch ports 112.
  • the method 500 may further comprise actions according to the described aforementioned embodiments of the agent 100. Hence, the method 500 achieves the same advantages as the agent 100.
  • the present disclosure further provides a computer program comprising instructions that, when the program is executed by a computer, cause the computer to perform the method 500 shown in FIG. 5.
  • the computer program may be included in a computer readable medium.
  • the computer readable medium may comprise essentially any memory, such as a ROM (Read-Only Memory) , a PROM (Programmable Read-Only Memory) , a 15 EPROM (Erasable PROM) , a Flash memory, an EEPROM (Electrically Erasable PROM) , or a hard disk drive.
  • the computer program achieves the same advantages as the method 500 and as the agent 100.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

An agent for dynamically adapting an Explicit Congestion Notification (ECN) configuration in a network system is provided. The agent is configured to send a first message to one or more neighbor agents, the first message comprising a property of a switch port associated with the agent. The agent is further configured to receive a second message from each neighbor agent, the second message comprising a property of a neighbor switch port associated with the neighbor agent. The agent is further configured to determine the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the neighbor switch ports. A property of a switch port may comprise at least one of a transmission rate, a queue length, and a rate of ECN marked packets at the respective switch port.

Description

DEVICE AND METHOD FOR AGENT FOR DYNAMICALLY ADAPTING EXPLICIT CONGESTION NOTIFICATION CONFIGURATION IN NETWORK SYSTEM TECHNICAL FIELD
The present disclosure relates to an agent for dynamically adapting an Explicit Congestion Notification (ECN) configuration in a network system, the network system comprising a plurality of agents, and each agent being associated with a switch port. The disclosure further provides a corresponding method and a computer program to perform the method.
BACKGROUND
Congestion Control (CC) plays a fundamental role in optimizing traffic in Data Center Networks (DCNs) . Currently, DCNs implement two main CC protocols: Data Center Transmission Control Protocol (DCTCP) and Data Center Quantized Congestion Notification (DCQCN) . Both protocols, as well as their main variants, are based on ECN, wherein intermediate switches mark packets when they detect congestion. The ECN configuration is thus a crucial aspect on the performance of widely deployed CC protocols.
Nowadays, network experts set static ECN parameters carefully selected to optimize the average network performance (e.g., throughput, latency) . However, high-speed DCNs experience quick and abrupt changes that severely change the network state (e.g., dynamic traffic workloads, incast events and failures) . This leads to under-utilization and sub-optimal performance.
CC has been extensively studied. As a result, there exists a number of solutions for DCNs tackling the problem from different angles, such as round-trip time (RTT) -based, credit-based, or telemetry-based mechanisms. Currently, most production DCNs implement the two main well-established CC protocols DCTCP and DCQCN. The former  is the main standard in traditional networks based on the TCP/IP stack, while the latter is the standard in modern RDMA-based networks. Both mechanisms, as well as their enhanced schemes, rely on ECN, so that switches mark packets when they experience congestion, and end-hosts dynamically adapt their transmission rate according to this congestion feedback. The most advanced conventional CC mechanisms, such as High Precision Congestion Control (HPCC) or Swift, have been successfully validated in production networks, showing a good performance compared to their contemporaries. However, these solutions require features that are not widely supported by legacy datacenter equipment, such as in-band network telemetry or accurate real-time RTT measurements.
Some of the proposed CC mechanisms show a very good performance in DCNs. However, they rely on novel network architectures and/or protocol stacks that are not supported by most legacy switches deployed in current DCNs. The most widely deployed CC standards are DCTCP in networks running the TCP/IP stack, and DCQCN in RDMA-based networks. Both of them, nevertheless, rely on static ECN-based mechanisms that do not adapt the marking ECN threshold depending on the observed traffic.
SUMMARY
In view of the above, an objective of this disclosure is to improve conventional solutions for dynamically adapting ECN configuration in network systems. Another objective is to provide an agent and a corresponding method that can operate successfully under traffic and topology changes. Once it is deployed, an objective is that the agent is enabled to cooperate with neighbor agents to optimize a global property of the network. Further, an objective is to make the agent compatible with any datacenter network running ECN-based protocols, such as DCTCP or DCQCN.
These and other objectives are achieved by the solutions provided in the independent claims. Advantageous implementations are further defined in the dependent claims.
According to a first aspect, an agent for dynamically adapting an ECN configuration in a network system is provided. The network system includes a plurality of agents, each agent being associated with a switch port. The agent is configured to: send a first message to one or more neighbor agents, where the first message includes a property of the switch  port associated with the agent; receive a second message from each neighbor agent of the one or more neighbor agents, the second message including a property of a neighbor switch port associated with the neighbor agent; and determine the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the neighbor switch ports.
This provides the advantage that the agent can independently adapt the ECN configuration of its associated port taking into consideration the internal properties (or attributes) of the switch port and the received communications (second messages) including the properties of neighboring ports.
In an implementation form of the first aspect, the property of the switch port includes at least one of: a transmission rate, a queue length, and a rate of ECN marked packets of the switch port; and/or the property of the neighbor switch port includes at least one of a transmission rate, a queue length, and a rate of ECN marked packets of the neighbor switch port.
In an implementation form of the first aspect, before sending the first message to one or more neighbor agents, the agent is configured to collect the property of the switch port associated with the agent.
The property of the switch port is commonly supported by commercial switches and can be locally obtained with low computational overhead at microsecond timescales.
In an implementation form of the first aspect, the agent is further configured to retrieve, from the second message, the property of the neighbor switch port associated with the neighbor agent.
This may enable the agent to additionally determine relevant information of the one or more switch ports received in the second messages.
In an implementation form of the first aspect, the agent is further configured to update the property of the switch port associated with the agent by combining the property of the switch port and the properties of the one or more neighbor switch ports retrieved from the second message.
This provides the advantage that the agent may acquire valuable context from the rest of agents, which may be taken into account to adapt the ECN configuration of its respective switch port.
In an implementation form of the first aspect, the agent is further configured to generate an updated first message, where the updated first message includes the updated property of the switch port associated with the agent.
Thereby, the agent may be able to further share the property of the switch port associated with it and that already includes information of the properties of the neighbor agents, to the one or more neighbor agents.
In an implementation form of the first aspect, before the determination of the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the one or more neighbor switch ports, the agent is further configured to perform an iteration procedure including performing a predefined number of iterations.
In each iteration the following steps are performed: sending the updated first message generated in a previous iteration to the one or more neighbor agents, or in the case of the first iteration send the first message to the one or more neighbor agents, where the updated first message includes the updated property of the switch port associated with the agent; receiving an updated second message from each of the one or more neighbor agents generated in the previous iteration, or in the case of the first iteration receiving the second message from each of the one or more neighbor agents, where the updated second message includes an updated property of the neighbor switch port associated with the neighbor agent; retrieving, from the updated second message, the updated property of the neighbor switch port; updating the property of the switch port associated with the agent by combining the updated property of the switch port calculated in the previous iteration and the updated property of each neighbor switch port retrieved from the one or more second messages; and generating another updated first message including the updated property of the switch port associated with the agent.
At each message passing iteration, the property of the switch port associated with the agent is updated so that valuable context to the rest of agents may be provided,  facilitating cooperation between the agents and enabling each agent to adapt the ECN configuration of its corresponding switch port accordingly.
Notably, the predefined number of iterations is a low number, for example 2 or 3 iterations, because a diameter of conventional DCNs is limited. Further, the predefined number of iterations may be a constant value and, thus, there is no need to change the number of iterations when changes in traffic or in topology of the network system occur.
In an implementation form of the first aspect, the determination of the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the neighbor switch ports includes determining an optimal ECN configuration for the switch port based on the final property of the switch port, where the final property of the switch port includes the updated property of the switch port after the last iteration; alternatively, in the case of the first iteration, determining an optimal ECN configuration for the switch port based on the property of the switch port and the properties of the neighbor switch ports received in the one or more second messages.
In an implementation form of the first aspect, the determined optimal ECN configuration for the switch port includes an ECN configuration for the switch port that optimizes the performance of the network system.
This provides the advantage that the agent may cooperate with the one or more neighbor agents to efficiently adapt the ECN configuration of the switch port associated with it to the fast traffic dynamics of conventional DCNs, and that enables to optimize a global flow-aware goal of the network system
In an implementation form of the first aspect, the ECN configuration for the switch port that optimizes the performance of the network system includes at least one of an ECN configuration for the switch port that decreases a Flow completion time (FCT) of the network system, an ECN configuration for the switch port that reduces a buffer occupancy of the network system, and an ECN configuration for the switch port that increases a throughput of the network system.
In an implementation form of the first aspect, the network system is configured based on a DCTCP protocol or a DCQCN protocol.
That is, the agent according to this disclosure is compatible with any datacenter network running DCTCP or DCQCN protocols.
In the first aspect and its implementations, the functions described may be implemented in hardware, software, firmware, or any combination thereof.
According to a second aspect, a method for an agent for dynamically adapting an ECN configuration in a network system is provided. The network system includes a plurality of agents, each agent being associated with a switch port, and the method includes: sending a first message to one or more neighbor agents, the first message including a property of the switch port associated with the agent; receiving a second message from each neighbor agent of the one or more neighbor agents, the second message including a property of a neighbor switch port associated with the neighbor agent; and determining the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the neighbor switch ports.
This provides the advantage that the agent can independently adapt the ECN configuration of its associated port taking into consideration the internal properties (or attributes) of the switch port and the received communications (second messages) including the properties of neighboring ports.
In an implementation form of the second aspect, the property of the switch port includes at least one of: a transmission rate, a queue length, and a rate of ECN marked packets of the switch port; and/or the property of the neighbor switch port includes at least one of a transmission rate, a queue length, and a rate of ECN marked packets of the neighbor switch port.
In an implementation form of the second aspect, before sending the first message to one or more neighbor agents, the method further comprises collecting the property of the switch port associated with the agent.
The property of the switch port is commonly supported by commercial switches and can be locally obtained with low computational overhead at microsecond timescales.
In an implementation form of the second aspect, the method further comprises retrieving, from the second message, the property of the neighbor switch port associated with the neighbor agent.
This may enable the agent to additionally determine relevant information of the one or more switch ports received in the second messages.
In an implementation form of the second aspect, the method further comprises updating the property of the switch port associated with the agent by combining the property of the switch port and the properties of the one or more neighbor switch ports retrieved from the second message.
This provides the advantage that the agent may acquire valuable context from the rest of agents, which may be taken into account to adapt the ECN configuration of its respective switch port.
In an implementation form of the second aspect, the method further comprises generating an updated first message, where the updated first message includes the updated property of the switch port associated with the agent.
Thereby, the agent may be able to further share the property of the switch port associated with it and that already includes information of the properties of the neighbor agents, to the one or more neighbor agents.
In an implementation form of the second aspect, before determining the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the one or more neighbor switch ports, the method further comprises performing an iteration procedure, including performing a predefined number of iterations.
In each iteration the following steps are performed: sending the updated first message generated in a previous iteration to the one or more neighbor agents, or in the case of the first iteration send the first message to the one or more neighbor agents, where the updated first message includes the updated property of the switch port associated with the agent; receiving an updated second message from each of the one or more neighbor agents generated in the previous iteration, or in the case of the first iteration receiving the second message from each of the one or more neighbor agents, where the updated second message includes an updated property of the neighbor switch port associated with the neighbor agent; retrieving, from the updated second message, the updated property of the neighbor switch port; updating the property of the switch port associated with the agent by combining the updated property of the switch port calculated in the previous iteration  and the updated property of each neighbor switch port retrieved from the one or more second messages; and generating another updated first message including the updated property of the switch port associated with the agent.
At each message passing iteration, the property of the switch port associated with the agent is updated so that valuable context to the rest of agents may be provided, facilitating cooperation between the agents and enabling each agent to adapt the ECN configuration of its corresponding switch port accordingly.
Notably, the predefined number of iterations is a low number, for example 2 or 3 iterations, because a diameter of conventional DCNs is limited. Further, the predefined number of iterations may be a constant value and, thus, there is no need to change the number of iterations when changes in traffic or in topology of the network system occur.
In an implementation form of the second aspect, determining the ECN configuration for the switch port associated with the agent based on the property of the switch port and the properties of the neighbor switch ports includes determining an optimal ECN configuration for the switch port based on the final property of the switch port, where the final property of the switch port includes the updated property of the switch port after the last iteration; alternatively, in the case of the first iteration, determining an optimal ECN configuration for the switch port based on the property of the switch port and the properties of the neighbor switch ports received in the one or more second messages.
In an implementation form of the second aspect, the determined optimal ECN configuration for the switch port includes an ECN configuration for the switch port that optimizes the performance of the network system.
This provides the advantage that the agent may cooperate with the one or more neighbor agents to efficiently adapt the ECN configuration of the switch port associated with it to the fast traffic dynamics of conventional DCNs, and that enables to optimize a global flow-aware goal of the network system
In an implementation form of the second aspect, the ECN configuration for the switch port that optimizes the performance of the network system includes at least one of an ECN configuration for the switch port that decreases a FCT of the network system, an ECN configuration for the switch port that reduces a buffer occupancy of the network  system, and an ECN configuration for the switch port that increases a throughput of the network system.
In an implementation form of the second aspect, the network system is configured based on a DCTCP protocol or a DCQCN protocol.
That is, the method according to this disclosure is compatible with any datacenter network running DCTCP or DCQCN protocols.
According to a third aspect, a computer program is provided, instructions, which when the program is executed by a computer, cause the computer to perform the method according to the second aspect and its implementation forms.
The computer program product according to the third aspect includes the features of the corresponding implementation forms of the method of the second aspect.
The method according to the second aspect and the computer program product according to the third aspect and their implementation forms provide the same advantages and effects as described above for the agent of the first aspect and its respective implementation forms.
The solutions according to this disclosure provide a general agent for dynamically adapting the ECN configuration in a network system that achieves the following advantages: the agent is compatible with widely deployed ECN-based CC protocols (e.g., DCTCP, DCQCN) and, thus, may be directly deployed in other network systems with different traffic distributions. Moreover, the agent may determine the ECN configuration of its respective switch port, enabling the switch port to take actions being aware of the property of its neighbor switch ports. This may achieve a global cooperation among the agents through message communications between neighboring switches, as well as a global reward, which in turn may provide with a better optimization potential than conventional solutions based on local rewards. Further, the agent may be implemented by neural network architectures, that may allow to trained the network system in controlled testbeds and then deploy it directly in real DCNs thereby greatly increasing the commercialization viability of the agent. Further, this disclosure may provide a distributed in-network agent for CC optimization in a network system that may be scalable and robust against possible failures and reconfigurations with respect to centralized solutions, thereby reducing complexity.
It has to be noted that all devices, elements, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof.
BRIEF DESCRIPTION OF DRAWINGS
The above described aspects and implementation forms of the present disclosure will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which:
FIG. 1 shows a schematic diagram of an agent for dynamically adapting an ECN configuration in a network system according to this disclosure;
FIG. 2 schematically shows an exemplary network system including a plurality of agents, each agent being associated with a switch port, according to this disclosure;
FIG. 3 shows an exemplary flowchart for dynamically adapting an ECN configuration where the agent is implemented by a message passing neural network, according to this disclosure;
FIG. 4 an exemplary flowchart for dynamically adapting an ECN configuration where the agent is implemented in an artificial intelligence chip, according to this disclosure;
FIG. 5 shows a method for an agent for dynamically adapting an ECN configuration in a network system according to this disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
FIG. 1 shows a schematic diagram of an agent 100 for dynamically adapting an ECN configuration in a network system 1. The network system 1 comprises a plurality of  agents  100, 110 and each  agent  100, 110 is associated with a  switch port  102, 112. In the embodiment of FIG. 1, the agent 100 is associated with the switch port 102, and each of the one or more neighbor agents 110 is associated with a switch port 112.
The agent 100 is configured to send a first message 104 to one or more neighbor agents 110. The first message 104 comprises a property 101 of the switch port 102 associated with the agent 100. The property 101 of the switch port 102 comprises any port-level metrics available at the switch port 102 that may help to optimize a global property of the network system 1, as it will be explained later in this description. Thus, the property 101 of the switch port 102 comprises, for example but not as a limitation, at least one of a transmission rate, a queue length, and a rate of ECN marked packets of the switch port 102. The first message 114 may be generated based on the property of the switch port 102, for example, by a processing block of the agent 100 (optional, thus shown in dashed lines) , which obtains the property 101 of the switch port 102. The agent 100 may comprise some internal attributes based on relevant metrics of its associated port 102 (e.g., utilization, queue length, etc. ) .
The agent 100 may be configured to collect the property 101 of its associated switch port 102 available at the switch port 102, for instance, at the optional processing block as indicated.
The property 101 of the switch port is commonly supported by commercial switches and can be locally obtained with low computational overhead at microsecond (μs) timescales.
Then, the agent 100 is configured to receive a second message 114 from each neighbor agent 110 of the one or more neighbor agents 110. The second message 114 comprises a property 103 of a neighbor switch port 112 associated with the neighbor agent 110. The property 103 of the neighbor switch port 112 comprises any port-level metrics available at the neighbour switch port 112 that may help to optimize a global property of the network system 1, for example but not as a limitation, at least one of a transmission rate, a queue length, and a rate of ECN marked packets of the neighbor switch port 112. Thus,  each neighbor agent 110 may comprise some internal attributes based on relevant metrics of its associated port 112 (e.g., utilization, queue length, etc. ) . At the neighbor agent 110, the second message 114 may be generated in a similar manner based on the property 103 of the neighbor switch port 112, than the first message 104 is generated at the agent 100 (e.g., by an optional processing block at the neighbor agent 110) .
The agent 100 may be further configured to retrieve, from each of the received second messages 114, the property 103 of the neighbor switch port 112 associated with the neighbor agent 110.
The agent 100 is configured to determine the ECN configuration 106 for the switch port 102 associated with it, based on the property 101 of the switch port 102 and the properties 103 of the one or more neighbor switch ports 112 received in the plurality of second messages 114 (in FIG. 1, as an example, only one neighbor agent 110 is shown) .
Thereby, each  agent  100, 110 independently adapts the ECN configuration of its associated  port  102, 112, taking into consideration its internal properties (or attributes) and the received communications (second messages) from neighboring agents 110.
Before the determination of the ECN configuration 106 for the switch port 102 associated with the agent 100 based on the property 101 of the switch port 102 and the properties 103 of the one or more neighbor switch ports 112, the agent 100 may be configured to update the property 101 of the switch port 102 associated with the agent 100 by combining the property 101 of the switch port 102 and the properties 103 of the one or more neighbor switch ports 112 retrieved from the second message 114.
Combining the property 101 of the switch port 102 associated with the agent 100 and the properties 103 of the one or more neighbor switch ports 112 retrieved from the second message 114 may comprise, for example, aggregating the properties 103 of each of the one or more neighbor switch ports 112, This can be done through an element-wise operation (e.g. element-wise min or max) .
Then, the agent 100 may be configured to generate an updated first message 104. The updated first message 104 may comprise the updated property 101 of the switch port 102 associated with the agent 100.
The agent 100 is further configured to perform an iteration procedure. The iteration procedure comprises performing a predefined number of iterations.
Notably, the predefined number of iterations is a low value, for example 2 or 3 iterations, because a diameter of conventional DCNs is limited. For example, for a DCN up to three layers, a suitable number of iterations may be equal to three. Further, the predefined number of iterations may be a constant value; hence, there is no need to change the number of iterations when changes in traffic or in topology of the network system 1 occur.
Each iteration comprises sending, by the agent 100, the updated first message 104 to the one or more neighbor agents 110, and receiving, by the agent 100, an updated second message 114 from each of the one or more neighbor agents 110. At the first iteration, the updated first message 104 comprises the first message 104 and the updated second message 114 comprises the second message 114, as disclosed above. Otherwise, the updated first message 104 comprises the updated property 101 of the switch port 102 associated with the agent 100 and that is generated in a previous iteration, and the updated second message 114 comprises an updated property 103 of the neighbor switch port 112 associated with the neighbor agent 110 that is generated in a previous iteration.
Further, each iteration comprises retrieving, by the agent 100, from the updated second message 114, the updated property 103 of the neighbor switch port 112. Then, each iteration comprises updating, by the agent 100, the property 101 of the switch port 102 associated with the agent 100 by combining the updated property 101 of the switch port 102 calculated in the previous iteration and the updated property 103 of each neighbor switch port 112 retrieved from the one or more second messages 114.
Next, each iteration comprises generating, by the agent 100, another updated first message 104 comprising the updated property 101 of the switch port 102 associated with the agent 100.
The determination of the ECN configuration 106 for the switch port 102 associated with the agent 100 based on the property of the switch port 102 and the properties 103 of the neighbor switch ports 112 comprises determining, by the agent 100, an optimal ECN configuration for the switch port 102 based on the final property of the switch port 102, where the final property 101 of the switch port 102 comprises the updated property 101 of the switch port 102 after the last iteration.
Alternatively, in the case of the first iteration, the determination of the ECN configuration 106 for the switch port 102 associated with the agent 100 based on the property of the switch port 102 and the properties 103 of the neighbor switch ports 112 comprises determining an optimal ECN configuration for the switch port 102 based on the property 101 of the switch port 102 and the properties 103 of the neighbor switch ports 112 received in the one or more second messages 114.
Additionally or alternatively, the determination of the ECN configuration 106 for the switch port 102 associated with the agent 100 based on the property of the switch port 102 and the properties of the neighbor switch ports 112 may comprise determining, by the agent 100, an optimal ECN configuration for the switch port 102 associated with it based on the property 101 of the switch port 102 updated in each iteration.
Thereby, at each message passing iteration, the property 101 of the switch port 102 is optimized so as to provide with valuable context to the neighbor agents 110, facilitating cooperation between them and enabling each  agent  100, 110 to optimally adapt the ECN settings 106 of its  respective switch port  102, 112 accordingly. That is, each  agent  100, 110 can independently adapt the ECN configuration of its associated  port  102, 112, taking into consideration its internal attributes and the properties 103 of the one or more neighbor switch ports 112 received in the communications exchanged with the neighbor agents 110. Those internal attributes may be used to update the properties of each  switch port  102, 112, and the  agents  100, 110 actually share the updated  properties  101, 103 of the  respective switch port  102, 112 to their neighbors 110 via  messages  104, 114 over the network 1.
The determined optimal ECN configuration 106 for the switch port 102 comprises an ECN configuration 106 for the switch port 102 that optimizes the performance of the network system 1.
The distributed ECN configuration 106 for the switch port 102 makes the solution according to this disclosure highly scalable to large networks.
The ECN configuration 106 for the switch port 102 that optimizes the performance of the network system 1 comprises at least one of an ECN configuration 106 for the switch port 102 that decreases a flow completion time (FCT) of the network system 1, an ECN configuration for the switch port 102 that reduces a buffer occupancy of the network  system 1, and an ECN configuration for the switch port 102 that increases a throughput of the network system 1.
On the determination of the optimal ECN configuration 106 for the switch port 102 based on the property 101 of the switch port 102 and the properties 103 of the neighbor switch ports 112 received in the one or more second messages 114, the agent 100 may use any suitable optimization method that enables to optimize a global flow-level metric of the network system 1 (e.g., the FCT, the buffer occupancy and/or the throughput of the network 1) . Thereby, the message communications between the plurality of  agents  100, 110 of the network system 1 enable a global cooperation between the  agents  100, 110, as they can take actions based not only on the local properties of the  respective switch ports  102, 112, but also on the properties from their surrounding (neighbor) switch  ports  102, 112.
Further, it is mentioned that each of the one or more neighbor agents 110 is configured to perform exactly the same process as described above for the agent 100, thereby all the  agents  100, 110 move from a local to a global context awareness of the network 1.
The network system 1 is configured based on a DCTCP protocol or a DCQCN protocol. That is, the plurality of  agents  100, 110 according to this disclosure are compatible with any datacenter network running DCTCP or DCQCN protocols.
By dynamically adapting ECN configuration of the switch port 102, the agent 100 according to this disclosure may further optimize CC in the network system 1
The agent 100 according to this disclosure, thus, can optimize the handling of congestion notifications within  switches  102, 112, which is a crucial component of CC protocols to efficiently optimize traffic in the network system 1. In this context, both DCTCP and DCQCN protocols implement a similar approach: switches mark a Congestion Experienced (CE) bit of packets in case the queue length exceeds some predefined thresholds.
DCTCP protocol adopts a hard cutoff, for example that all packets are marked when the queue length exceeds a certain value k. Instead, the DCQCN protocol implements a softer RED-like probabilistic approach with three configuration parameters {k min, k max, p max} given in equation (1) :
Figure PCTCN2022138121-appb-000001
where q len is the instantaneous queue length in the port, and p mark is the probability for marking a packet.
Thus, when the network system 1 according to this disclosure is configured based on a DCTCP protocol, the determined ECN configuration 106 for the switch port 102 associated with the agent 100 may comprise, for example and not as a limitation, a value k that optimizes the performance of the network system 1 (i.e., that optimizes at least one of the FCT, the buffer occupancy and the throughput of the network system 1) .
When the network system 1 according to this disclosure is configured based on a DCQCN protocol, the determined ECN configuration 106 for the switch port 102 associated with the agent 100 may comprise, for example and not as a limitation, values of the quantities k min, k max, p max and/or p mark that optimize the performance of the network system 1 (i.e., that optimize at least one of the FCT, the buffer occupancy and the throughput of the network system 1) .
The agent 100 may comprise processing circuitry (not shown) configured to perform, conduct or initiate the various operations of the agent 100 described herein. The processing circuitry may comprise hardware and software. The hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry. The digital circuitry may comprise components such as application-specific integrated circuits (ASICs) , field-programmable arrays (FPGAs) , digital signal processors (DSPs) , or multi-purpose processors. The agent 100 may further comprise memory circuitry, which stores one or more instruction (s) that can be executed by the processor or by the processing circuitry, in particular under control of the software. For instance, the memory circuitry may comprise a non-transitory storage medium storing executable software code which, when executed by the processor or the processing circuitry, causes the various operations of the agent 100 to be performed. In one embodiment, the processing circuitry comprises  one or more processors and a non-transitory memory connected to the one or more processors. The non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the agent 100 to perform, conduct or initiate the operations or methods described herein.
In this exemplary embodiment, the  agents  100, 110 may enable in-network optimization at  intermediate switch ports  102, 112. Further, the agent 100 may provide with a fully decentralized mechanism. The only observable mechanism in the network system 1, thus, may be the message communications between neighboring  agents  100, 110. The first message 104 and the second message 114 provide contextual information to the agent 100 and, moreover, a pattern at which first message 104 and the second message 114 are sent may be recognizable.
The  agents  100, 110 of this exemplary embodiment may reduce the network system 1 complexity by providing with a solution that is more scalable and robust against possible failures and reconfigurations with respect to conventional centralized solutions. Further, as the agent 100 is compatible with widely deployed ECN-based CC protocols (e.g., DCTCP or DCQCN) , the agent 100 may be directly deployed in other network systems 1 with different traffic distributions, increasing the efficiency and the commercialization viability of the agent 100.
Notably, the agent 100 may provide with topology-aware local context, as it may enable its associated switch port 102 to take actions (for example, to determine whether a packet may be marked) while being aware of the properties 103 of its neighbor switch ports 112. Thereby, global cooperation is achieved through message communications between the agent 100 and the one or more neighbor agents 110. Additionally, a global reward may be achieved, since the agent 100 may determine the ECN configuration for the switch port 102 that optimizes a goal of the whole network system 1, which may achieve better results than conventional solutions based on local rewards.
FIG. 2 schematically depicts an exemplary network system 1 according to this disclosure. Same elements are labelled with the same reference signs.
The exemplary network system 1 comprises a plurality of  agents  100, 110, exemplary agents A 1 to A 9, and each  agent  100, 110 is associated with a  switch port  102, 112. Each agent 100, exemplary agent A 4, may be configured to send a first message 104 to one of  more neighbor agents 110, exemplary agents A 2, A 3, A 5, A 7 and A 9, and to receive a second message from each of the one or more neighbor agents 110 A 2, A 3, A 5, A 7 and A 9, where the first message 104 comprises a property 101 of the switch port 102 associated with the agent 100 and each of the second messages 114 comprises a property 103 of the neighbor switch port 112 associated with the neighbor agent 110.
In the network system 1 according to this disclosure, each  port  102, 112 can be an egress port. Further, there can be as  many agents  100, 110 as ports in each switch, so that each  agent  100, 110 can control the ECN configuration of a particular port.
In the example of Fig. 2, each agent 100, exemplary agent A 4, may also send the first message 104 to other exemplary agents, for examples to agents A 1, A 6 and/or A 8, and may also receive the second message from them. This is not limited in this disclosure.
The exemplary agents A 1, A 4, A 6 and A 8 may form a spine of the network system 1, and the exemplary agents A 2, A 3, A 5, A 7 and A 9 may form a leaf of the network system 1.
The exemplary network system 1 of FIG. 2 may be configured based on a DCTCP protocol or a DCQCN protocol.
The agent 100 according to this disclosure may be implemented in a type of Graph Neural Networks (GNN) called Message Passing Neural Network (MPNN) . This MPNN architecture may help the  agents  100, 110 to process and model information of the state of the network system 1, and may enable propagating local information over the whole network system 1 by allowing message communications between  agents  100, 110, so that they can update their internal attributes depending on the second message 114 received from each neighbor agent 110. Furthermore, there is a direct mapping between the MPNN execution and the actual network system , namely that switch  ports  102, 112 can be represented as node/link entities in the GNN, and the information exchanged internally by port- level agents  100, 110 can be transmitted via  messages  104, 114 sent through the network system infrastructure.
Then, by providing proper rewards for the actions that the agents 100 can take, each port- level agent  100, 110 may eventually learn what information is relevant to exchange with its neighbor agents 110, and may also discover a manner to adapt the ECN configuration 106 of tis corresponding switch port 102 to optimize the global objective (e.g., minimize the FCT, minimize the buffer occupancy, and/or maximize the throughput  of the network system 1) . As all the  agents  100, 110 have the same internal parameters (i.e., the  respective switch ports  102, 112 have the same type of properties) , the  agents  100, 110 can optimize the  property  101, 103 of the  respective switch ports  102, 112 jointly. After training, the agent 100 implemented in MPNN can be replicated and deployed across distributed devices of any other DCN.
FIG. 3 depicts an exemplary embodiment of a flowchart performed by the agent 100 according to this disclosure. Same elements are labelled with the same reference signs. In the embodiment of FIG. 3, the agent 100 is implemented in a MPNN
In the embodiment of FIG. 3, each of the plurality of  agents  100, 100, i.e., the agent 100 and the one or more neighbor agents 110, may be implemented in a MPNN and may achieve the same processes as disclosed in the following for the agent 100. The first message 104 may comprise the property 101 of the switch port 102 associated with the agent 100 in the form of a hidden state of the agent 100, and each of the second messages 114 may comprise the property 103 of the neighbor switch port 112 associated with the neighbor agent 110 in the form a hidden state of the neighbor agent 110.
In this exemplary embodiment, the agent 100 may comprise three main modules: an Information Retrieval Module 302, a Hidden State Generation Module 304, and an Action Selection Module 306.
The Information Retrieval Module 302 may receive the second message 114 from each of the one or more neighbor agents 110. The Information Retrieval Module 302 can be implemented by a Neural Network (NN) module that may be configured to process the second messages 114 received from the one or more neighbor agents 110, and to further select relevant information of the hidden states of the neighbor agents 110 that should be considered.
The Information Retrieval Module 302 may be further configured to aggregate the selected relevant information of the hidden states of the neighbor agents 110. The aggregation can be done, for example, through an element-wise operation (for example element-wise min or max) .
Further, the Information Retrieval Module 302 may be configured to output the aggregated information of the hidden states of the neighbor agents 110.
The output of the Information Retrieval Module 302 may be used by the Hidden State Generation Module 304. The Hidden State Generation Module 304 can be based on a NN, and can be configured to combine the aggregated selected information of the hidden states of the neighbor agents 110 with the hidden state of the agent 100, thus generating a new or updated hidden state for the agent 100.
The Information Retrieval Module 302 and the Hidden State Generation Module 304 may conform a Communication Module 308 that can be identified to the operation of the MPNN. Variants of the MPNN may consider that the functions executed by the Information Retrieval Module 302 and the Hidden State Generation Module 304 may not necessarily be modeled by a NN.
Then, the Communication Module 308, additionally or alternatively the Hidden State Generation Module 304, may be configured to generate an updated first message 104 comprising the updated hidden state for the agent 100, and to subsequently send the updated first message 104 to the one or more neighbor agents 110. The Communication Module 308 may be executed for a predefined number of iterations.
Each of the one or more neighbor agents 110 may perform exactly the same process, so that all the  agents  100, 110 move from a local to a global context awareness.
After the iterative execution of the Communication Module 308, the Action Selection Module 306 of the agent 100 may be configured to determine an optimal ECN configuration for the switch port 102 associated with the agent 100 based on the final hidden state of the agent 100, which takes into account the updated hidden state of each of the one or more neighbor agents 110. The final hidden state of the agent 100 may comprise the updated hidden state of the agent 100 after the last iteration.
The Action Selection Module 306 can be implemented by a NN that produces, as an output, a final action policy as depicted in FIG. 3, where the final action policy may comprise the ECN configuration 106 of the switch port 102 associated with the agent 100.
The determined optimal ECN configuration for the switch port 102 associated with the agent 100 may comprise an ECN configuration 106 for the switch port 102 that optimizes the performance of the network system 1. The ECN configuration 106 for the switch port 102 that optimizes the performance of the network system 1 comprises at least one of the ECN configuration 106 for the switch port 102 that decreases the FCT of the network  system 1, the ECN configuration for the switch port 102 that reduces the buffer occupancy of the network system 1, and the ECN configuration for the switch port 102 that increases the throughput of the network system 1.
In this exemplary embodiment, the hidden state representations of the  agents  100, 100 are central elements, since they may be the actual content of message communications. The hidden states of the  agents  100, 110 need to encode not only the  agent  100, 110 discriminant features required to define its individual behavior, but also relevant information about the context to facilitate the action decision making of neighboring agents 110.
The general flowchart performed by this exemplary embodiment of the agent 100 is now described. In a first step, the agent 100 may collect some port-level metrics available at its corresponding switch port 102, for example and not as a limitation, at least one of the transmission rate or utilization of the switch port 102, the instantaneous queue length and the rate of ECN-marked packets in the switch port 102. These metrics represent the input features depicted in FIG. 3.
In a second step, the agent 100 may initialize its hidden state based on tis corresponding input features. That is, the input features are encoded into a hidden state of the agent 100, and the hidden state of the agent 100 may be a fixed-size vector. The initialization may be performed, for example, by adding the input features and applying zero-padding to fit a dimension (i.e., the fixed size) of the hidden state vector for the agent 100.
In a third step, the agent 100 may sends the first message 104 to the one or more neighbor agents 110. The first message may comprise the hidden state of the agent 100. Given that each  agent  100, 110 represents  egress ports  102, 112, the agent 100 may actually send the first message 104 to all ports 112 that can potentially receive traffic from it.
In a fourth step, the agent 100 may receive the hidden states of all of its neighbor agents 110, and may processes them as follows:
● First, the hidden state of each neighbor agent 110 may be combined with the hidden state of the agent 100 through a Message Function 310. The Message Function 310 can be implemented by a NN module, for example by the Information Retrieval Module 302.
● Then, the combined hidden state of the agent 100 and hidden states of the one of more neighbor agents 110 may be aggregated using a predefined Aggregation Function 312, e.g., an element-wise max or min function.
● Next, the agent 100 may update its own hidden state with the aggregated hidden states. This task can be performed by an Update Function 314, which may be implemented by a NN module, for example by Hidden State Generation Module 304.
Thereby, the agent 100 may generate an updated hidden state of the agent 100 that potentially incorporates information of the one or more neighbor agents 110.
In a fifth step, the message exchange and processing described in the third and fourth steps may be repeated in a predefined number of iterations. In each iteration, the updated hidden states of the one or more neighbor agents 110 received in the one or more second messages 114 may be considered. During these iterations:
● There may be a recognizable pattern of periodic message communications between neighboring agents 110.
● Assuming that the updated hidden state of the agent 100 may add some local topological awareness through the updated hidden state of the neighbor agents 110 received in the second messages 114, a range of the network 1 is expanded at successive message passing iterations. At each iteration, the agent 100 may have access to information of more distant points (agents 110) in the network 1.
● Therefore, it may be expected that the hidden state of the agent 100 and the hidden states of the neighbor agents 110 may evolve from sparse data to much more dense representations in hidden state vectors as the iterations of the message passing procedure are executed.
The number of iterations may be predefined, and typically is a low value, e.g. 2 or 3 iterations, because the diameter of DCNs is limited.
In a sixth step, at the end of the message passing, the agent 100 may individually evaluate its final hidden state through a Readout Function 316. The Readout Function 316 can be implemented by another NN module, for example the Action Selection Module 306.
The Readout Function 316 may provide the agent 100 with its action policy, i.e., with the optimal ECN configuration for the switch port 102 associated with the agent 100.
The exemplary implementation of the agent 100 by MPNN may be trained using, for example and not as a limitation, a Reinforcement Learning algorithm.
In this exemplary embodiment, the agent 100 may be trained offline, (e.g., in a controlled testbed) and then can be directly deployed in production networks, without the need for re-training on premises. Hence, it may avoid uncertainty of online training in production networks. In addition, offline training may enable to perform extensive testing before deployment, with the possibility of issuing certifications determining the operational ranges that the agent 100 can safely support (e.g., link capacities or maximum network size) . This certification process is aligned with the conventional way in which networking products are currently commercialized.
Further, during training, the agent 100 may learn how to communicate with the one or more neighbor agents 110 to dynamically adapt the ECN configuration 106 of its respective switch port 102 that optimizes a global flow-level metric (e.g., FCT) of the network system 1. The message communications may enable a global cooperation between the  agents  100, 110, as they can take actions based not only on their local state information, but also on data from their neighbor agents 110.
In this exemplary embodiment, the agent 100 is compatible with any data center network running DCTCP or DCQCN protocols. That is, when the agent 100 and the neighbor agents 110 are implemented by MPNN, the network system 1 may be configured based on a DCTCP protocol or a DCQCN protocol.
The determined ECN configuration 106 for the switch port 102 associated with the agent 100 may comprise, for example and not as a limitation, a value k that optimizes the performance of the network system 1 configured based on a DCTCP protocol. As a further example, the determined ECN configuration 106 for the switch port 102 associated with the agent 100 may comprise values of the quantities k min, k max, p max and/or p mark that optimize the performance of the network system 1 configured based on a DCQCN protocol, as depicted in FIG. 3.
Recent works have focused on the use of machine learning (ML) techniques to produce data-driven solutions that can efficiently adapt to the network dynamics. Some of the  conventional ML-based CC proposals, including for example AuTO, Aurora or Orca. However, these solutions are not compatible with widely deployed equipment in datacenters, as they propose to re-implement the network stack. A more recent solution, an automatic run-time optimization scheme (ACC) proposes to perform in-network optimization, by dynamically adapting the ECN configuration on switches. This solution has shown a remarkable performance in production environments and it is compatible with current datacenter equipment running widely deployed ECN-based CC protocols (e.g., DCTCP, DCQCN) . Nevertheless, ACC is designed for online training and, thus, it needs to be re-trained to gradually learn how to adapt to the current network conditions. In general, online training is not always appropriate in production environments, as it carries an implicit uncertainty on what would be the resulting performance of agents after re-training.
Thus, this exemplary embodiment for the agent 100 may alleviate the problems and disadvantages of conventional solutions based on ML and NN DCNs.
FIG. 4 shows an exemplary embodiment of a flowchart performed by the agent 100 according to this disclosure. Same elements are labelled with the same reference signs.
In the exemplary embodiment of FIG. 4, the agent 100 and the neighbor agents 110 may be implemented, for example and not as a limitation, in  chips  400, 410 optimized for Artificial Intelligence (AI) applications. The  chips  400, 410 may be embedded in  switch ports  102, 112 associated with the  agents  100, 110. Each  chip  400, 410 may comprise a network interface card (NIC) .
In a first step, the agent 100 may retrieve the property of the switch port 102 associated with it from its local NIC. The property 101 of the switch port 102 may comprise any port-level metrics available at the switch port 102 that may help to optimize a global property of the network system 1, for example and not as a limitation, at least one of a transmission rate of the switch port 102 in the form of bytes transmitted by the NIC, depicted as tx rate in FIG. 4, a queue length, q len, of the switch port 102 and a rate of ECN marked packets, ECN marks of the switch port 102. These properties are commonly supported by commercial switches, and can be locally obtained by the agent 100 with low computational overhead at timescales of the order of microseconds.
In a second step, the agent 100 may start a communication with the one or more neighbor agents 110 deployed in neighbor switches 112 to gain a local context. This communication is done by sending the first message 104 to the one or more neighbor agents 110.
In this embodiment, the communications may be performed, for example and not as a limitation, through a NN-driven message passing, as disclosed above for the exemplary implementation of the agent 100 by MPNN depicted in FIG. 3, where the  agents  100, 110 exchange the first message 104 and the second messages 114 directly encoded by NN modules in order to find the optimal ECN configuration 106 for the switch port 102 in its associated NIC.
This communication only requires exchanging a few bytes with neighboring switches 112 and can take a few microseconds (μs) . The base link propagation delay in production DCNs is typically of the order of 1 μs.
To set the optimal ECN configuration 106, for example the quantities k min, k max, p max and/or p mark in the case that the network system 1 is configured based on a DCQCN protocol, the agent 100 may directly interface with forwarding chips using a respective Application Programming Interface (API) , which is typically vendor-specific.
In this exemplary embodiment, the agent 100 may be further configured to mark packets according to the determined optimal ECN configuration 106 for the switch port 102.
Further, in this exemplary embodiment, the agent 100 may be configured to act as an end-host.
In a third step, a CC protocol can be executed at the end-host. The end-host may adjust the flow transmission rate of the  respective switch port  102, 112 based on ECN feedback. This process may be as follows: if a host receives an ECN-marked packet, the host may notify it to a sender by using an acknowledgement ACK.
When a sender receives the ACK, it may re-compute the flow rate of its respective switch port 102 according to a protocol-specific algorithm. The protocol-specific algorithm may comprise, for example, an Additive-Increase/Multiplicative-Decrease protocol.
This CC mechanism may enable to gradually react at one-RTT timescales that are approximately 10 μs in high-speed DCNs. The in-network optimization achieve in this  disclosure is orthogonal and complementary to the selection of the flow rate control algorithm (e.g., DCTCP or DCQCN) .
Further, the agent 100 according to this disclosure is compatible with any ECN-based CC protocol and can be deployed along with any other well-established traffic optimization techniques, such as flow scheduling.
FIG. 5 shows an exemplary method 500 for an agent 100 for dynamically adapting an ECN configuration in a network system 1 according to this disclosure. The network system 1 comprises a plurality of  agents  100, 110, each  agent  100, 110 being associated with a  switch port  102, 112, as disclosed before in this disclosure.
The method 500 comprises a step 502 of sending a first message 104 to one or more neighbor agents 110. The first message 104 comprises a property 101 of the switch port 102 associated with the agent 100, and the property of the switch port 102 comprises any port-level metrics available at the switch port 102 that may help to optimize a global property of the network system 1, for example and not as a limitation, at least one of a transmission rate, a queue length, and a rate of ECN marked packets of the switch port 102.
Then, the method 500 comprises a step 304 of receiving a second message 114 from each neighbor agent 110 of the one or more neighbor agents 110. The second message 114 comprises a property 103 of a neighbor switch port 112 associated with the neighbor agent 110. The property 103 of the neighbor switch port 112 comprises any port-level metrics available at the neighbour switch port 112 that may help to optimize a global property of the network system 1, for example and not as a limitation, at least one of a transmission rate, a queue length, and a rate of ECN marked packets of the neighbor switch port 112.
The method 500 further comprises a step 306 of determining the ECN configuration 106 for the switch port 102 associated with the agent 100 based on the property 101 of the switch port 102 and the properties 103 of the neighbor switch ports 112.
The method 500 may further comprise actions according to the described aforementioned embodiments of the agent 100. Hence, the method 500 achieves the same advantages as the agent 100.
The present disclosure further provides a computer program comprising instructions that, when the program is executed by a computer, cause the computer to perform the method 500 shown in FIG. 5.
The computer program may be included in a computer readable medium. The computer readable medium may comprise essentially any memory, such as a ROM (Read-Only Memory) , a PROM (Programmable Read-Only Memory) , a 15 EPROM (Erasable PROM) , a Flash memory, an EEPROM (Electrically Erasable PROM) , or a hard disk drive.
The computer program achieves the same advantages as the method 500 and as the agent 100.
The present disclosure has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed matter, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word "comprising" does not exclude other elements or steps and the indefinite article "a" or "an" does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.

Claims (13)

  1. An agent (100) for dynamically adapting an Explicit Congestion Notification, ECN, configuration in a network system (1) , the network system (1) comprising a plurality of agents (100, 110) , each agent (100, 110) being associated with a switch port (102, 112) , and the agent (100) being configured to:
    send a first message (104) to one or more neighbor agents (110) , the first message (104) comprising a property (101) of the switch port (102) associated with the agent (100) ;
    receive a second message (114) from each neighbor agent (110) of the one or more neighbor agents (110) , the second message (114) comprising a property (103) of a neighbor switch port (112) associated with the neighbor agent (110) ; and
    determine the ECN configuration (106) for the switch port (102) associated with the agent (100) based on the property (101) of the switch port (102) and the properties (103) of the neighbor switch ports (112) .
  2. The agent (100) according to claim 1, wherein
    the property (101) of the switch port (102) comprises at least one of: a transmission rate, a queue length, and a rate of ECN marked packets of the switch port (102) ; and/or
    the property (103) of the neighbor switch port (112) comprises at least one of: a transmission rate, a queue length, and a rate of ECN marked packets of the neighbor switch port (112) .
  3. The agent (100) according to claim 1 or 2, wherein before sending the first message (104) to one or more neighbor agents, the agent (100) is configured to:
    collect the property (101) of the switch port (102) associated with the agent (100) .
  4. The agent (100) according to one of claims 1 to 3, further configured to:
    retrieve, from the second message (114) , the property (103) of the neighbor switch port (112) associated with the neighbor agent (110) .
  5. The agent (100) according to one of the claims 1 to 4, wherein the agent (100) is further configured to:
    update the property (101) of the switch port (102) associated with the agent (100) by combining the property (101) of the switch port (102) and the properties (103) of the one or more neighbor switch ports (112) retrieved from the second message (114) .
  6. The agent (100) according to one of claims 1 to 5, further configured to:
    generate an updated first message (104) , wherein the updated first message (104) comprises the updated property (101) of the switch port (102) associated with the agent (100) .
  7. The agent (100) according to one of claims 1 to 6, wherein before the determination of the ECN configuration for the switch port (102) associated with the agent (100) based on the property (101) of the switch port (102) and the properties (103) of the one or more neighbor switch ports (112) , the agent (100) is further configured to:
    perform an iteration procedure comprising performing a predefined number of iterations, wherein in each iteration the following steps are performed:
    sending the updated first message (104) generated in a previous iteration to the one or more neighbor agents (110) , or in the case of the first iteration sending the first message (104) to the one or more neighbor agents (110) , wherein the updated first message (104) comprises the updated property (101) of the switch port (102) associated with the agent (100) ;
    receiving an updated second message (114) from each of the one or more neighbor agents (110) generated in the previous iteration, or in the case of the first iteration receiving the second message (114) from each of the one or more neighbor agents (110) , wherein the updated second message (114) comprises an updated property (103) of the neighbor switch port (112) associated with the neighbor agent (110) ;
    retrieving from the updated second message (114) the updated property (103) of the neighbor switch port (112) ;
    updating the property (101) of the switch port (102) associated with the agent (100) by combining the updated property (101) of the switch port (102) calculated in the previous iteration and the updated property (103) of each neighbor switch port (112) retrieved from the one or more second messages (114) ; and
    generating another updated first message (104) comprising the updated property (101) of the switch port (102) associated with the agent (100) .
  8. The agent (100) according to one of the claims 1 to 7, wherein the determination of the ECN configuration (106) for the switch port (102) associated with the agent (100) based on the property (101) of the switch port (102) and the properties (103) of the neighbor switch ports (112) comprises:
    determining an optimal ECN configuration for the switch port (102) based on the final property (101) of the switch port (102) , wherein the final property (101) of the switch port (102) comprises the updated property (101) of the switch port (102) after the last iteration; or
    in the case of the first iteration, determining an optimal ECN configuration for the switch port (102) based on the property (101) of the switch port (102) and the properties (103) of the neighbor switch ports (112) received in the one or more second messages (114) .
  9. The agent (100) according to claim 8, wherein the determined optimal ECN configuration for the switch port (102) comprises an ECN configuration (106) for the switch port (102) that optimizes the performance of the network system (1) .
  10. The agent (100) according to claim 9, wherein the ECN configuration (106) for the switch port (102) that optimizes the performance of the network system (1) comprises  at least one of: an ECN configuration (106) for the switch port (102) that decreases a flow completion time, FCT, of the network system (1) , an ECN configuration (106) for the switch port (102) that reduces a buffer occupancy of the network system (1) , and an ECN configuration (106) for the switch port (102) that increases a throughput of the network system (1) .
  11. The agent (100) according to one of the claims 1 to 10, wherein the network system (1) is configured based on a Data Center Transmission Control Protocol, DCTCP, or a Data Center Quantized Congestion Notification, DCQCN, protocol.
  12. A method (500) for an agent for dynamically adapting an Explicit Congestion Notification, ECN, configuration in a network system (1) , the network system (1) comprising a plurality of agents, each agent being associated with a switch port, and the method (500) comprising:
    sending (502) a first message (104) to one or more neighbor agents (110) , the first message (104) comprising a property (101) of the switch port (102) associated with the agent (100) ;
    receiving (504) a second message (114) from each neighbor agent (110) of the one or more neighbor agents (110) , the second message (114) comprising a property (103) of a neighbor switch port (112) associated with the neighbor agent (110) ; and
    determining (506) the ECN configuration (106) for the switch port (102) associated with the agent (100) based on the property (101) of the switch port (102) and the properties (103) of the neighbor switch ports (112) .
  13. A computer program comprising instructions which, when the program is executed by a computer, cause the computer to perform the method (500) according to claim 12.
PCT/CN2022/138121 2022-12-09 2022-12-09 Device and method for agent for dynamically adapting explicit congestion notification configuration in network system WO2024119513A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/138121 WO2024119513A1 (en) 2022-12-09 2022-12-09 Device and method for agent for dynamically adapting explicit congestion notification configuration in network system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/138121 WO2024119513A1 (en) 2022-12-09 2022-12-09 Device and method for agent for dynamically adapting explicit congestion notification configuration in network system

Publications (1)

Publication Number Publication Date
WO2024119513A1 true WO2024119513A1 (en) 2024-06-13

Family

ID=91378366

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138121 WO2024119513A1 (en) 2022-12-09 2022-12-09 Device and method for agent for dynamically adapting explicit congestion notification configuration in network system

Country Status (1)

Country Link
WO (1) WO2024119513A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101958833A (en) * 2010-09-20 2011-01-26 云南省科学技术情报研究院 RED-based network congestion control algorithm
CN101964755A (en) * 2010-11-03 2011-02-02 中南大学 Explicit congestion control method based on bandwidth estimation in high-bandwidth delay network
CN112532530A (en) * 2019-09-18 2021-03-19 华为技术有限公司 Method and equipment for adjusting congestion notification information
US20210344601A1 (en) * 2020-04-29 2021-11-04 Huawei Technologies Co., Ltd. Congestion Control Method, Apparatus, and System, and Computer Storage Medium
US20220200858A1 (en) * 2019-09-12 2022-06-23 Huawei Technologies Co., Ltd. Method and apparatus for configuring a network parameter

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101958833A (en) * 2010-09-20 2011-01-26 云南省科学技术情报研究院 RED-based network congestion control algorithm
CN101964755A (en) * 2010-11-03 2011-02-02 中南大学 Explicit congestion control method based on bandwidth estimation in high-bandwidth delay network
US20220200858A1 (en) * 2019-09-12 2022-06-23 Huawei Technologies Co., Ltd. Method and apparatus for configuring a network parameter
CN112532530A (en) * 2019-09-18 2021-03-19 华为技术有限公司 Method and equipment for adjusting congestion notification information
US20210344601A1 (en) * 2020-04-29 2021-11-04 Huawei Technologies Co., Ltd. Congestion Control Method, Apparatus, and System, and Computer Storage Medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YAN SIYU [email protected]; WANG XIAOLIANG [email protected]; ZHENG XIAOLONG [email protected]; XIA YINBEN XIAYINBEN@HUAW: "ACC automatic ECN tuning for high-speed datacenter networks", PROCEEDINGS OF THE 32ND CONFERENCE ON L'INTERACTION HOMME-MACHINE, ACMPUB27, NEW YORK, NY, USA, 9 August 2021 (2021-08-09) - 3 December 2021 (2021-12-03), New York, NY, USA, pages 384 - 397, XP058944450, ISBN: 978-1-4503-8607-4, DOI: 10.1145/3452296.3472927 *

Similar Documents

Publication Publication Date Title
US10171332B2 (en) Probing technique for predictive routing in computer networks
CN111919423B (en) Congestion control in network communications
US11736364B2 (en) Cascade-based classification of network devices using multi-scale bags of network words
WO2021169308A1 (en) Data stream type identification model updating method and related device
JP7451689B2 (en) Network congestion processing method, model update method, and related devices
CN112422443B (en) Adaptive control method, storage medium, equipment and system of congestion algorithm
CN112532409B (en) Network parameter configuration method, device, computer equipment and storage medium
CN112511325B (en) Network congestion control method, node, system and storage medium
CN114465962B (en) Data stream type identification method and related equipment
CN112887217B (en) Control data packet sending method, model training method, device and system
CN111211988B (en) Data transmission method and system for distributed machine learning
US9935832B2 (en) Automated placement of measurement endpoint nodes in a network
Li et al. Learning-based and data-driven tcp design for memory-constrained iot
CN106658644B (en) Communication network routing method and device
US11100364B2 (en) Active learning for interactive labeling of new device types based on limited feedback
Huang et al. Machine learning for broad-sensed internet congestion control and avoidance: A comprehensive survey
CN113328953A (en) Method, device and storage medium for network congestion adjustment
WO2024119513A1 (en) Device and method for agent for dynamically adapting explicit congestion notification configuration in network system
CN115426327B (en) Calculation force scheduling method and device, electronic equipment and storage medium
CN111901237B (en) Source routing method and system, related device and computer readable storage medium
KR102277554B1 (en) Controller and method for providing switch migration in software defined networking
Hagos et al. Classification of delay-based TCP algorithms from passive traffic measurements
Tang et al. ABS: Adaptive buffer sizing via augmented programmability with machine learning
Bernárdez et al. GraphCC: A Practical Graph Learning-based Approach to Congestion Control in Datacenters
Zhang et al. RoNet: Toward Robust Neural Assisted Mobile Network Configuration