WO2018210117A1 - 一种拥塞控制方法、网络设备及其网络接口控制器 - Google Patents

一种拥塞控制方法、网络设备及其网络接口控制器 Download PDF

Info

Publication number
WO2018210117A1
WO2018210117A1 PCT/CN2018/084819 CN2018084819W WO2018210117A1 WO 2018210117 A1 WO2018210117 A1 WO 2018210117A1 CN 2018084819 W CN2018084819 W CN 2018084819W WO 2018210117 A1 WO2018210117 A1 WO 2018210117A1
Authority
WO
WIPO (PCT)
Prior art keywords
data packet
identifier
address
path
destination
Prior art date
Application number
PCT/CN2018/084819
Other languages
English (en)
French (fr)
Inventor
宋浩宇
冀智刚
张亚丽
夏寅贲
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP18803065.4A priority Critical patent/EP3618372B1/en
Publication of WO2018210117A1 publication Critical patent/WO2018210117A1/zh
Priority to US16/683,730 priority patent/US11228534B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/19Flow control; Congestion control at layers above the network layer
    • H04L47/193Flow control; Congestion control at layers above the network layer at the transport layer, e.g. TCP related
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/27Evaluation or update of window size, e.g. using information derived from acknowledged [ACK] packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/26Flow control; Congestion control using explicit feedback to the source, e.g. choke packets
    • H04L47/263Rate modification at the source after receiving feedback
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/31Flow control; Congestion control by tagging of packets, e.g. using discard eligibility [DE] bits
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9063Intermediate storage in different physical parts of a node or terminal
    • H04L49/9068Intermediate storage in different physical parts of a node or terminal in the network interface card

Definitions

  • the present application relates to the field of communications, and in particular, to a congestion control method, a network device, and a network interface controller thereof.
  • DCTCP Data Center Transmission Control Protocol
  • ECN Display Congestion Notification
  • the data packet sent by the sending device includes a congestion field.
  • the intermediate network device determines whether to set a congestion flag in the congestion field according to the current queue length of the device.
  • the receiving device carries the congestion tag information in the acknowledgement (English: acknowledgment, ACK) message and returns it to the sending device.
  • the sending device may determine whether congestion occurs in the data packet transmission according to the congestion marking information in the acknowledgement packet, so as to adjust the sending rate (for example, when the congestion occurs, the sending window size is reduced).
  • the congestion control used in the above data transmission process is relatively simple and cannot provide more accurate and effective congestion control.
  • the present application provides a congestion control method, a network device, and a network interface controller thereof, which help provide more accurate and effective congestion control and improve network resource utilization.
  • a congestion control method is provided.
  • the method is applied to a data transmission network comprising a transmitting device, at least one intermediate device, and a receiving device.
  • the sending device sends a data packet to the receiving device by using the intermediate device.
  • the method includes:
  • the first intermediate device of the at least one intermediate device receives the first data packet sent by the sending device, and sends the first data packet to the receiving device by using the first path;
  • the window value is used to notify the sending device of the number of bytes that the receiving device can receive.
  • the first identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the first intermediate device modifies the window value in the acknowledgment message returned to the sending device, so that the sending device can adjust the sending rate according to the window value in the acknowledgment message.
  • the window value is modified according to the congestion degree of the path and the number of bytes of the data packet, which helps to achieve more accurate and effective congestion control and improve network resource utilization.
  • the modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is light, and the first data packet has the data packet of the first identifier If the number of bytes is greater than the set threshold (ie, the data packet having the first identifier belongs to the elephant stream), the first intermediate device reduces the acknowledgement message in the first acknowledgement packet with the second identifier. Window value.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port, and the source IP address, the source port, the destination IP address, and the destination port in the first identifier respectively and the destination in the second identifier
  • the IP address, destination port, source IP address, and source port are the same.
  • modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is heavy, and there is no congestion degree from the first intermediate device to the receiving device a path that is light or uncongested (ie, there is no other path from the first intermediate device to the receiving device, or the degree of congestion of all paths from the first intermediate device to the receiving device is severe And the first intermediate device reduces the window value in all the acknowledgement messages in the first acknowledgement message.
  • the transmitting device reduces the transmission rate of all data messages (including the elephant stream and the mouse stream) on the first path.
  • the modifying the window value in the first acknowledgement packet includes: if the congestion degree of the first path is heavy, and the first data packet has the data identifier of the first identifier The number of sections is greater than a set threshold (ie, the data message with the first identifier belongs to the elephant stream), and there is a second path with a slight or no congestion degree from the first intermediate device to the receiving device (There is a changeable path), the first intermediate device sets a window value of 0 for the acknowledgement message with the second identifier in the first acknowledgement message, and starts a timer.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the source IP address, the source port, the destination IP address, and the destination port in the first identifier are the same as the destination IP address, the destination port, the source IP address, and the source port in the second identifier, respectively.
  • the method further includes: after the timer expires, the first intermediate device sends, by using the second path, the first data packet received by the sending device from the sending device to have a first The identified data message. Create a switch timing by setting the window value of the confirmation message to 0. By setting a timer, the packets being transmitted on the path have time to reach the receiving device, which reduces the possibility of packet misordering caused by the switching.
  • the first intermediate device before the timer expires, if the first intermediate device receives the second acknowledgment message sent by the receiving device, and the second acknowledgment message indicates the first data packet All the data packets having the first identifier are received by the receiving device, and the first intermediate device sends, by using the second path, the third data packet received from the sending device to the receiving device. The data message having the first identifier. If all the data packets sent by the sending device to the receiving device have arrived at the receiving device before the timer expires (the switch does not cause the packets to be out of order), the first intermediate device directly performs the switching process, which helps. To reduce waiting time.
  • a congestion control method is provided. This method is applied to data transmission networks.
  • the data transmission network includes a transmitting device, at least one intermediate device, and a receiving device.
  • the sending device includes a processor and a network interface controller (NIC).
  • NIC network interface controller
  • the sending device sends a data packet to the receiving device by using the intermediate device.
  • the method includes:
  • the network interface controller NIC receives the first data packet sent by the processor, and sends the first data packet to the receiving device by using the first path;
  • the NIC receives a first acknowledgment message sent by the receiving device to confirm the first data packet, and determines a congestion degree of the first path according to a congestion flag in the first acknowledgment message. ;
  • the NIC modifies the window value in the first acknowledgement packet according to the congestion degree of the first path and the number of bytes of the data packet with the first identifier in the first data packet, and the modified The first acknowledgement message is sent to the processor.
  • the window value is used to notify the processor of the number of bytes that the receiving device can receive.
  • the first identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the NIC modifies the window value in the acknowledgment message returned to the processor, thereby enabling the processor to adjust the transmission rate according to the window value in the acknowledgment message.
  • the window value is modified according to the congestion degree of the path and the number of bytes of the data packet, which helps to achieve more accurate and effective congestion control.
  • the modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is light, and the first data packet has the data packet of the first identifier If the number of bytes is greater than the set threshold, the NIC reduces the window value in the acknowledgement message with the second identifier in the first acknowledgement message.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port, and the source IP address, the source port, the destination IP address, and the destination port in the first identifier respectively and the destination in the second identifier
  • the IP address, destination port, source IP address, and source port are the same.
  • modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is heavy, and there is no congestion degree from the NIC to the receiving device is mild or If the path is not congested, the NIC reduces the window value in all the acknowledgement messages in the first acknowledgement message.
  • the modifying the window value in the first acknowledgement packet includes: if the congestion degree of the first path is heavy, and the first data packet has the data identifier of the first identifier If the number of sections is greater than a set threshold, and there is a second path that is slightly or not congested from the NIC to the receiving device, the NIC sets the second identifier in the first acknowledgement message.
  • the confirmation message has a window value of 0 and starts the timer.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port, and the source IP address, the source port, the destination IP address, and the destination port in the first identifier respectively and the destination in the second identifier
  • the IP address, destination port, source IP address, and source port are the same.
  • the method further includes: after the timer expires, the NIC sends, by using the second path, the data that has the first identifier in a second data packet received by the processor to the receiving device. Message.
  • the NIC receives the second acknowledgment message sent by the receiving device, and the second acknowledgment message indicates that all the first data packets have The first identifier of the data packet is received by the receiving device, and the NIC sends the first data packet received from the processor to the receiving device by using the second path to have the first data packet.
  • the identified data message is received by the receiving device, and the NIC sends the first data packet received from the processor to the receiving device by using the second path to have the first data packet.
  • a network device is provided.
  • the network device is used in a data transmission network.
  • the data transmission network includes a transmitting device, at least one intermediate device, and a receiving device.
  • the network device is a first intermediate device of the at least one intermediate device.
  • the sending device sends a data packet to the receiving device by using the intermediate device.
  • the network device includes a receiving unit, a transmitting unit, and a processing unit.
  • the receiving unit is configured to receive the first data packet sent by the sending device, and is further configured to receive a first acknowledgement packet that is sent by the receiving device and used to confirm the first data packet.
  • a sending unit configured to send the first data packet to the receiving device by using the first path.
  • a processing unit configured to determine, according to a congestion flag in the first acknowledgement packet, a congestion degree of the first path, and according to a congestion degree of the first path, and a first identifier in the first data packet The number of bytes of the data packet is modified to modify the window value in the first acknowledgement message.
  • the first identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the sending unit is further configured to send the modified first confirmation message to the sending device.
  • the window value is used to notify the sending device of the number of bytes that the receiving device can receive.
  • the modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is light, and the first data packet has the data packet of the first identifier If the number of bytes is greater than the set threshold, the processing unit reduces the window value in the acknowledgement message with the second identifier in the first acknowledgement message.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port. The source IP address, the source port, the destination IP address, and the destination port in the first identifier are the same as the destination IP address, the destination port, the source IP address, and the source port in the second identifier, respectively.
  • modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is heavy, and there is no congestion degree from the network device to the receiving device is mild Or the path of the non-congested, the processing unit reduces the window value in all the acknowledgement messages in the first acknowledgement message.
  • the modifying the window value in the first acknowledgement packet includes: if the congestion degree of the first path is heavy, and the first data packet has the data identifier of the first identifier If the number of sections is greater than a set threshold, and there is a second path that is slightly or not congested from the network device to the receiving device, the processing unit sets the first acknowledgement message to have a The window value of the acknowledgement message of the second identifier is 0, and the timer is started.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the source IP address, the source port, the destination IP address, and the destination port in the first identifier are the same as the destination IP address, the destination port, the source IP address, and the source port in the second identifier, respectively.
  • the sending unit is further configured to: after the timer expires, send, by the second path, to the receiving device, the first data packet received by the receiving unit from the sending device has the first The identified data message.
  • the sending unit is further configured to: before the timer expires, if the receiving unit receives the second acknowledgment message sent by the receiving device, and the second acknowledgment message indicates the If all the data packets with the first identifier in the first data packet have been received by the receiving device, send, by the second path, the receiving unit receives the first receiving device from the sending device.
  • the data packet having the first identifier is included in the three data packets.
  • a network interface controller NIC is provided.
  • the NIC is used in a data transmission network.
  • the data transmission network includes a transmitting device, at least one intermediate device, and a receiving device.
  • the transmitting device includes a processor and the NIC.
  • the sending device sends a data packet to the receiving device by using the intermediate device.
  • the NIC includes a receiving unit, a transmitting unit, and a processing unit.
  • the receiving unit is configured to receive the first data packet sent by the processor, and is further configured to receive a first acknowledgement packet that is sent by the receiving device and used to confirm the first data packet.
  • a sending unit configured to send the first data packet to the receiving device by using the first path.
  • a processing unit configured to determine, according to a congestion flag in the first acknowledgement packet, a congestion degree of the first path, and according to a congestion degree of the first path, and a first identifier in the first data packet The number of bytes of the data packet is modified to modify the window value in the first acknowledgement message.
  • the first identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the sending unit is further configured to send the modified first confirmation message to the processor.
  • the window value is used to notify the processor of the number of bytes that the receiving device can receive.
  • the modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is light, and the first data packet has the data packet of the first identifier If the number of bytes is greater than the set threshold, the processing unit reduces the window value in the acknowledgement message with the second identifier in the first acknowledgement message.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port. The source IP address, the source port, the destination IP address, and the destination port in the first identifier are the same as the destination IP address, the destination port, the source IP address, and the source port in the second identifier, respectively.
  • modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is heavy, and there is no congestion degree from the NIC to the receiving device is mild or The non-congested path, the processing unit reduces the window value in all the acknowledgement messages in the first acknowledgement message.
  • the modifying the window value in the first acknowledgement packet includes: if the congestion degree of the first path is heavy, and the first data packet has the data identifier of the first identifier If the number of sections is greater than a set threshold, and there is a second path that is slightly or not congested from the NIC to the receiving device, the processing unit sets the second acknowledgement message to have a second The window value of the identified acknowledgment message is 0 and the timer is started.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the source IP address, the source port, the destination IP address, and the destination port in the first identifier are the same as the destination IP address, the destination port, the source IP address, and the source port in the second identifier, respectively.
  • the sending unit is further configured to: after the timer expires, send, by the second path, to the receiving device, the first data packet received by the receiving unit from the processor has the first The identified data message.
  • the sending unit is further configured to: before the timer expires, if the receiving unit receives a second acknowledgment message sent by the receiving device, and the second acknowledgment message indicates the If all the data packets with the first identifier in the first data packet have been received by the receiving device, send, by the second path, the receiving unit receives the The data packet having the first identifier is included in the three data packets.
  • a network device is provided.
  • the network device is used in a data transmission network.
  • the data transmission network includes a transmitting device, at least one intermediate device, and a receiving device.
  • the network device is a first intermediate device of the at least one intermediate device.
  • the sending device sends a data packet to the receiving device by using the intermediate device.
  • the network device includes a receiver, a transmitter, and a processor.
  • the receiver is configured to receive the first data packet that is sent by the sending device, and is further configured to receive a first acknowledgement packet that is sent by the receiving device and used to confirm the first data packet.
  • a transmitter configured to send the first data packet to the receiving device by using a first path.
  • a processor configured to determine, according to a congestion flag in the first acknowledgement packet, a congestion degree of the first path, and according to a congestion degree of the first path, and a first identifier in the first data packet The number of bytes of the data packet is modified to modify the window value in the first acknowledgement message.
  • the first identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the transmitter is further configured to send the modified first acknowledgement message to the sending device.
  • the window value is used to notify the sending device of the number of bytes that the receiving device can receive.
  • modifying the window value in the first acknowledgement packet includes: if the congestion degree of the first path is light, and the first data packet has the data packet of the first identifier If the number of bytes is greater than the set threshold, the processor reduces the window value in the acknowledgement message with the second identifier in the first acknowledgement message.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port. The source IP address, the source port, the destination IP address, and the destination port in the first identifier are the same as the destination IP address, the destination port, the source IP address, and the source port in the second identifier, respectively.
  • modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is heavy, and there is no congestion degree from the network device to the receiving device is mild Or a path that is not congested, the processor reduces a window value in all acknowledgement messages in the first acknowledgement message.
  • the modifying the window value in the first acknowledgement packet includes: if the congestion degree of the first path is heavy, and the first data packet has the data identifier of the first identifier If the number of sections is greater than a set threshold, and there is a second path in which the degree of congestion from the network device to the receiving device is mild or uncongested, the processor sets the first acknowledgement message to have a The window value of the acknowledgement message of the second identifier is 0, and the timer is started.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the source IP address, the source port, the destination IP address, and the destination port in the first identifier are the same as the destination IP address, the destination port, the source IP address, and the source port in the second identifier, respectively.
  • the transmitter is further configured to: after the timer expires, send, by the second path, to the receiving device, the first data packet received by the receiving unit from the sending device has the first The identified data message.
  • the transmitter is further configured to: before the timer expires, if the receiver receives the second acknowledgement packet sent by the receiving device, and the second acknowledgement packet indicates the If all the data packets with the first identifier in the first data packet have been received by the receiving device, send, by the second path, the receiver receives the data received by the receiver from the sending device.
  • the data packet having the first identifier is included in the three data packets.
  • a network interface controller NIC is provided.
  • the NIC is used in a data transmission network.
  • the data transmission network includes a transmitting device, at least one intermediate device, and a receiving device.
  • the transmitting device includes a first processor and the NIC.
  • the sending device sends a data packet to the receiving device by using the intermediate device.
  • the NIC includes a receiver, a transmitter, and a second processor.
  • the receiver is configured to receive the first data packet that is sent by the first processor, and is further configured to receive a first acknowledgement packet that is sent by the receiving device and used to confirm the first data packet.
  • a transmitter configured to send the first data packet to the receiving device by using a first path
  • the first identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the transmitter is further configured to send the modified first acknowledgement message to the first processor.
  • the window value is used to notify the first processor of the number of bytes that the receiving device can receive.
  • the modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is light, and the first data packet has the data packet of the first identifier If the number of bytes is greater than the set threshold, the second processor reduces the window value in the acknowledgement message with the second identifier in the first acknowledgement message.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port. The source IP address, the source port, the destination IP address, and the destination port in the first identifier are the same as the destination IP address, the destination port, the source IP address, and the source port in the second identifier, respectively.
  • modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is heavy, and there is no congestion degree from the NIC to the receiving device is mild or The non-congested path, the second processor reduces the window value in all the acknowledgement messages in the first acknowledgement message.
  • the modifying the window value in the first acknowledgement packet includes: if the congestion degree of the first path is heavy, and the first data packet has the data identifier of the first identifier If the number of sections is greater than a set threshold, and there is a second path that is slightly or not congested from the NIC to the receiving device, the second processor sets the first acknowledgement message to have The window value of the acknowledgement message of the second identifier is 0, and the timer is started.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the source IP address, the source port, the destination IP address, and the destination port in the first identifier are the same as the destination IP address, the destination port, the source IP address, and the source port in the second identifier, respectively.
  • the transmitter is further configured to: after the timer expires, send, by using the second path, to the receiving device, the second data packet that is received by the receiver from the first processor, The first identified data message.
  • the transmitter is further configured to: before the timer expires, if the receiver receives a second acknowledgment message sent by the receiving device, and the second acknowledgment message indicates the All data packets having the first identifier in the first data packet are received by the receiving device, and the receiver is sent to the receiving device via the second path, and the receiver receives the first processor.
  • the third data packet has the data packet of the first identifier.
  • a network device is provided.
  • the network device is used in a data transmission network.
  • the data transmission network includes a transmitting device, at least one intermediate device, and a receiving device.
  • the network device is the sending device.
  • the sending device sends a data packet to the receiving device by using the intermediate device.
  • the transmitting device includes: a processor and a network interface controller NIC.
  • the processor is configured to send a first data packet to the NIC.
  • the NIC is configured to receive the first data packet sent by the processor, send the first data packet to the receiving device by using a first path, and further configured to receive, by the receiving device, a Determining the congestion degree of the first path according to the congestion flag in the first acknowledgement packet, according to the congestion degree of the first path, and the The number of bytes of the data packet having the first identifier in the first data packet modifies the window value in the first acknowledgement packet, and sends the modified first acknowledgement packet to the processor.
  • the window value is used to notify the processor of the number of bytes that the receiving device can receive.
  • the first identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the processor is further configured to receive the modified first acknowledgment message sent by the NIC, and send the second data packet to the receiving device by using the NIC according to the modified first acknowledgment message. .
  • the modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is light, and the first data packet has the data packet of the first identifier If the number of bytes is greater than the set threshold, the NIC reduces the window value in the acknowledgement message with the second identifier in the first acknowledgement message.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port. The source IP address, the source port, the destination IP address, and the destination port in the first identifier are the same as the destination IP address, the destination port, the source IP address, and the source port in the second identifier, respectively.
  • modifying the window value in the first acknowledgement message includes: if the congestion degree of the first path is heavy, and there is no congestion degree from the NIC to the receiving device is mild or If the path is not congested, the NIC reduces the window value in all the acknowledgement messages in the first acknowledgement message.
  • the modifying the window value in the first acknowledgement packet includes: if the congestion degree of the first path is heavy, and the first data packet has the data identifier of the first identifier If the number of sections is greater than a set threshold, and there is a second path that is slightly or not congested from the NIC to the receiving device, the NIC sets the second identifier in the first acknowledgement message.
  • the confirmation message has a window value of 0 and starts the timer.
  • the second identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the source IP address, the source port, the destination IP address, and the destination port in the first identifier are the same as the destination IP address, the destination port, the source IP address, and the source port in the second identifier, respectively.
  • the NIC is further configured to: after the timer expires, send, by using the second path, to the receiving device, the second data packet received by the NIC from the processor, where the first identifier is Data message.
  • the NIC is further configured to: if the NIC receives the second acknowledgment message sent by the receiving device, and the second acknowledgment message indicates the first All data packets having the first identifier in the data packet have been received by the receiving device, and the third datagram received by the NIC from the processor is sent to the receiving device via the second path.
  • the data packet having the first identifier is included in the text.
  • a computer readable storage medium stores instructions that, when executed on a computer, cause the computer to perform the methods described in the various aspects above.
  • a computer program product comprising instructions is provided.
  • the computer program product is run on a computer, the computer is caused to perform the methods described in the various aspects above.
  • FIG. 1 is a schematic structural diagram of a network according to an embodiment of the present application.
  • FIG. 2 is a flowchart of a congestion control method according to an embodiment of the present application.
  • FIG. 3 is a structural diagram of a network device 300 according to an embodiment of the present application.
  • FIG. 4 is a structural diagram of a network device 400 according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a network according to an embodiment of the present application.
  • FIG. 6 is a flowchart of a congestion control method according to an embodiment of the present application.
  • FIG. 7 is a structural diagram of a network interface controller 700 according to an embodiment of the present application.
  • FIG. 8 is a structural diagram of a network interface controller 800 according to an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a network provided by an embodiment of the present application.
  • the network includes a transmitting device 101, at least one intermediate device (e.g., intermediate devices 102, 103, 104, 105, 107, and 108 in Figure 1) and a receiving device 106.
  • the transmitting device 101 transmits a data message to the receiving device 106 through the intermediate device.
  • the transmitting device 101 can establish a Transmission Control Protocol (TCP) connection with the receiving device 106.
  • TCP Transmission Control Protocol
  • the data messages transmitted by the transmitting device 101 all arrive at the intermediate device 102 and can then arrive at the receiving device 106 via the first path.
  • TCP Transmission Control Protocol
  • the intermediate device 102 may be a first hop network device or an Nth hop network device that the data packet sent by the sending device 101 to the receiving device 106 arrives, and N is an integer greater than 1.
  • the first path may include zero, one or more intermediate devices (eg, intermediate devices 103, 104, and 105 in FIG. 1).
  • the acknowledgment message (for example, ACK message) sent by the receiving device 106 for confirming the data message sent by the sending device 101 can reach the intermediate device 102 via the first path, and then sent by the intermediate device 102 to the transmitting device 101.
  • the acknowledgment message may arrive at the intermediate device 102 via a second path different from the first path, and then sent by the intermediate device 102 to the transmitting device 101.
  • the second path may include zero, one or more intermediate devices (eg, intermediate devices 105, 108, and 107 in FIG. 1).
  • the acknowledgment message for confirming the data message transmitted by the transmitting device 101 passes through the intermediate device 102 to the transmitting device 101.
  • the network device is applied to the data center.
  • the sending device 101 can be a host or a server, and the receiving device 106 can be a host or a server.
  • the intermediate device can be a switch, a router, a virtual switch, or the like.
  • the intermediate device 102 can also be a top of rack (TOR) switch, and the intermediate device 105 can also be a TOR switch.
  • TOR top of rack
  • FIG. 2 is a flowchart of a congestion control method according to an embodiment of the present application. This method can be applied to the network structure shown in FIG. The method comprises the following steps:
  • the intermediate device 102 receives the first data packet sent by the sending device 101, and sends the first data packet to the receiving device 106 via the first path.
  • the first path may include zero, one or more intermediate devices, such as the intermediate devices 103, 104, and 105 shown in FIG.
  • the intermediate device 102 After receiving the first data packet, the intermediate device 102 selects the transmission path of the first data packet, and uses the source routing technology to carry the information of the path to be transmitted in the sent data packet.
  • the information of the path is, for example, an IP address of each intermediate device included in the first path through which the first data packet passes.
  • the intermediate device 102 adds an IP address of the intermediate device to be passed in the option (English: options) of the Internet Protocol (IP) header of the first data packet (for example, an intermediate device) IP addresses of 103, 104, and 105). Therefore, each intermediate device on the first path can forward the data packet according to the IP address in the option field of the IP packet header.
  • IP Internet Protocol
  • the specific implementation of the source routing technology can be referred to the Internet Engineering Task Force (IETF) RFC 791 (INTERNET PROTOCOL, DARPA INTERNET PROGRAM PROTOCOL SPECIFICATION), the entire contents of which are incorporated herein by reference.
  • IETF Internet Engineering Task Force
  • the first data packet may include multiple data packets, and the multiple data packets may have different identifiers.
  • the plurality of data messages can be distinguished by identification.
  • the identifier includes, for example, a source IP address, a source port, a destination IP address, and a destination port.
  • the identification may, for example, also include a transport layer protocol number (the transport layer protocol number of TCP is 6), whereby the identity constitutes a five-tuple.
  • the intermediate device 102 may record the correspondence between the identifier of each data packet in the first data packet and the first path.
  • the first identifier of the data packet having the first identifier in the first data packet is recorded (for example, including the source IP address of the sending device 101, the source port on the sending device 101, the destination IP address of the receiving device 106, and the receiving device 106.
  • the destination port on it) and the corresponding information of the first path for example including the IP addresses of the intermediate devices 103, 104, and 105.
  • the other identifier of the data packet with the other identifier in the first data packet may be recorded (eg, the source IP address of the sending device 101, another source port on the sending device 101, the destination IP address of the receiving device 106, and the receiving).
  • the intermediate device 102 can record the correspondence between the identifiers of all data packets sent through the first path and the first path.
  • the data packet sent by the first path may further include a data packet that is not from the sending device 101 to the receiving device 106 (that is, the sending end is not the sending device 101 or the receiving end is not the receiving device 106, but the data packet passes the first Path transmission, Figure 1 is not shown).
  • the intermediate device 102 can also record the identifier of the data message sent to the receiving device 106 via other paths and the information of the corresponding path.
  • the intermediate device 102 records the correspondence between the identifier of the data packet and the path, and the path corresponding to the identifier can be determined according to the identifier of the data packet (for example, the first path is determined according to the first identifier).
  • the intermediate device on the first path sets a congestion flag in the first data packet.
  • each intermediate device on the first path can save the first data packet according to the current device.
  • the queue length determines the degree of congestion.
  • the queue length is, for example, the length of the data packet to be transmitted buffered in the transmission buffer on each intermediate device.
  • two thresholds may be set on the intermediate device (for example, t1 and t2, where t1 ⁇ t2). When the queue length is less than the threshold t1, the congestion degree is not congested, and when the queue length is greater than or equal to the threshold t1.
  • the intermediate device compares the degree of congestion with the degree of congestion indicated by the congestion flag of the first data message.
  • the congestion flag of the first data packet may include a plurality of congestion tags, and each of the plurality of congestion tags is included in a data packet in the first data packet.
  • the first data packet uses the differentiated service code point (DSCP) field of the IP packet header as the congestion flag.
  • the DSCP field of the IP packet header can refer to IETF RFC2474 (Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers), the entire contents of which are incorporated herein by reference.
  • the DSCP field includes 4 bits. From the data packet sent by the transmitting device 101 to the receiving device 106, for example, the first 2 bits of the 4 bits can be used as the congestion flag. When the congestion flag is 00, it means no congestion, when it is 01, it means light congestion, and when it is 10, it means heavy congestion.
  • the intermediate device 103 determines that the congestion degree on the own device is heavy congestion, if the congestion flag in the first data packet is 00 or 01, the congestion flag is updated to 10, if the first data If the congestion flag in the message is 10, the congestion flag is not updated.
  • the intermediate device 103 determines that the degree of congestion on the device is slightly congested, if the congestion flag in the first data packet is 00, the congestion flag is updated to 01, if the congestion flag in the first data packet is 01. Or 10, the congestion flag is not updated.
  • the intermediate device 103 determines that the degree of congestion on the own device is not congested, the congestion flag is not updated.
  • the congestion flag in the data message received by the receiving device 106 can indicate the heaviest congestion level on the entire path.
  • intermediate devices on other paths can also set congestion flags for data packets sent over other paths in the above manner.
  • the intermediate device 102 may also determine the congestion level according to the queue length of the current intermediate device 102 for saving the first data packet, and update the congestion flag in the first data packet by using the congestion flag corresponding to the congestion level.
  • the receiving device 106 sends a first acknowledgement message to the intermediate device 102.
  • the receiving device 106 After receiving the first data packet, the receiving device 106 acquires a congestion flag in the first data packet, for example, the first two digits of the DSCP field, to generate a first acknowledgement packet.
  • the first acknowledgement packet is used to confirm the first data packet, for example, an ACK packet in the TCP protocol.
  • the first acknowledgment message may include a plurality of acknowledgment messages, and each acknowledgment message is used to confirm a data packet in the first data packet, and the plurality of acknowledgment messages may be distinguished by the identifier.
  • the identifier includes, for example, a source IP address, a source port, a destination IP address, and a destination port.
  • the identification may, for example, also include a transport layer protocol number (the transport layer protocol number of TCP is 6), whereby the identity constitutes a five-tuple.
  • the correspondence between the identifier of the data packet and the identifier of the acknowledgment packet for confirming the data packet is: the source IP address, the source port, the destination IP address, and the destination port of the identifier of the data packet and the identifier of the acknowledgement packet respectively.
  • the destination IP address, destination port, source IP address, and source port are the same.
  • the identifier of the first data packet and the identifier of the first acknowledgement packet have the above corresponding relationship. Therefore, the first identifier of the data packet with the first identifier in the first data packet may correspond to the second identifier of the acknowledgement packet with the second identifier in the first acknowledgement packet.
  • the first acknowledgment message uses the last 2 bits of the DSCP field of the IP packet header as a congestion flag, and the value of the congestion flag in the first acknowledgment message sent by the receiving device 106 and the first data received by the receiving device 106.
  • the value of the congestion tag in the message is the same.
  • the first acknowledgement message may arrive at the intermediate device 102 by the same path (ie, the first path) that the first data message is sent through.
  • the first acknowledgement message may also arrive at the intermediate device 102 via a second path that is different from the first path.
  • the second path may include zero, one or more intermediate devices, such as the intermediate devices 105, 108, and 107 shown in FIG.
  • the data message sent by the intermediate device 102 to the receiving device 106 via other paths also has a corresponding acknowledgement message sent by the receiving device 106 to the intermediate device 102.
  • the identifier of the data packet sent by the other path and the identifier of the corresponding acknowledgement packet also have the above corresponding relationship.
  • the acknowledgement packet sent by the receiving device 106 may also adopt the above manner of using the congestion flag in the first acknowledgement message, and when the data packet is sent.
  • the confirmation message is returned by the same path or a different path.
  • the intermediate device 102 determines a congestion degree of the first path according to the congestion flag in the first acknowledgement message.
  • the intermediate device 102 may determine the first identifier according to the second identifier in the first acknowledgement packet (refer to the correspondence between the first identifier and the second identifier in S203), and further determine information about the first path according to the first identifier (see S201).
  • the intermediate device 102 can record the first identifier and the correspondence between the first paths). Further, the intermediate device 102 may calculate the congestion level according to the number or proportion of the acknowledgement packets indicating the congestion degree in the first acknowledgement message.
  • the intermediate device 102 calculates the proportion of the acknowledgment message indicating the heavy acknowledgment message in the plurality of acknowledgment messages in the first acknowledgment message, and when the ratio is greater than the threshold r2, determining that the congestion degree of the first path is heavy congestion, when the ratio is When the threshold r2 is greater than or equal to the threshold r1 (r1 ⁇ r2), it is determined that the congestion degree of the first path is light congestion, and when the ratio is less than the threshold r1, it is determined that the congestion degree of the first path is not congested.
  • the intermediate device 102 may further include, in the plurality of acknowledgement messages in the first acknowledgement message, a sum of a congestion indication indicating a heavy acknowledgement message and a congestion flag indicating a slight acknowledgement message in the multiple acknowledgements.
  • the ratio of the total number of packets is compared with the threshold r1 and the threshold r2 (r1 ⁇ r2).
  • the intermediate device 102 compares the number of the acknowledgment messages indicated by the congestion flag in the plurality of acknowledgment messages in the first acknowledgment message with the threshold value n1 and the threshold value n2 (n1 ⁇ n2), when the number is greater than the threshold value.
  • n2 determining that the congestion degree of the first path is heavy congestion, when the ratio is less than or equal to the threshold value n2 being greater than the threshold value n1 (n1 ⁇ n2), determining that the congestion degree of the first path is mild congestion, and when the ratio is less than the threshold value n1, determining The congestion degree of the first path is not congested.
  • the intermediate device 102 may use the above process to determine the congestion degree of each path according to the congestion flag in the acknowledgment message. And record.
  • the intermediate device 102 can record the identity of the data message, the information of the path used, and the degree of path congestion.
  • the intermediate device 102 modifies the window value in the first acknowledgement packet according to the congestion degree of the first path and the number of bytes of the data packet with the first identifier in the first data packet.
  • the modified first confirmation message is sent to the sending device 101.
  • the TCP protocol includes a window value field in the message header for flow control.
  • the receiving device sends an ACK packet to the sending device, and notifies the sending device of the number of bytes that the receiving device can receive through the window value in the ACK packet.
  • the sending device adjusts the sending rate of the data packet according to the window value in the ACK packet.
  • IETF RFC 793 TraNSMISSION CONTROL PROTOCOL, DARPA INTERNET PROGRAM PROTOCOL SPECIFICATION
  • the intermediate device 102 modifies the window value in the first acknowledgement packet according to the congestion degree of the first path and the number of bytes of the data packet having the first identifier in the first data packet.
  • the intermediate device 102 can separately count the number of bytes of the data packet having each identifier in the data packet, so as to be based on the congestion degree of the path and the number of bytes of the data packet having each identifier. Determine different window values. For example, when the first data packet is sent by the intermediate device 102 in step S201, the number of bytes of the data packet having the first identifier in the first data packet may be counted.
  • the intermediate device 102 compares the counted number of bytes with the set threshold f1 to distinguish whether the data message having the first identifier belongs to an elephant stream (English: elephant flow, ie, a maximal continuous stream) or a mouse stream (English) :mice flow, ie short stream).
  • an elephant stream English: elephant flow, ie, a maximal continuous stream
  • a mouse stream English:mice flow, ie short stream.
  • the intermediate device 102 counts the number of bytes of the data packet having the first identifier in the first data packet, and when the number of bytes is greater than the threshold value f1, the data packet with the first identifier belongs to the large
  • the image stream when the number of bytes is less than or equal to the threshold f1, the data message having the first identifier belongs to a mouse stream.
  • the intermediate device 102 performs cumulative counting on the number of bytes of the data packets having the respective identifiers, and obtains an accumulated number of bytes according to each identifier.
  • the process of accumulating the number of bytes per identification ends after the intermediate device 102 receives an acknowledgement message including the identity and the FIN flag (see IETF RFC 793).
  • the intermediate device 102 determines the identifier of the corresponding data packet according to the identifier in the acknowledgement packet including the FIN flag, and clears the accumulated byte number of the data packet with the corresponding identifier, or records the entry with the corresponding identifier. delete.
  • the intermediate device 102 can make different adjustments to the transmission rate of the elephant stream mouse stream according to different congestion levels of the first path, which helps to achieve more accurate and effective congestion control.
  • This adjustment is achieved by modifying the window value of the confirmation message used to confirm the data message in the elephant stream or the mouse stream. For example, you can modify the window value as follows:
  • the first acknowledgement packet has the first The window value in the confirmation message of the two identifiers, wherein the first identifier and the second identifier have the foregoing corresponding relationship.
  • the acknowledgement message with the second identifier in the first acknowledgement message may include multiple acknowledgement messages, each acknowledgement message includes a window value, and the window value in each acknowledgement message is reduced.
  • the calculation of the specific window value is, for example, the calculation of the cwnd value in the congestion control algorithm of DCTCP, and the calculation refers to the Datacenter TCP (DCTCP) issued by the IETF: TCP Congestion Control for Datacenters, draft-ietf-tcpm-dctcp-03, the file The entire content is incorporated herein by reference.
  • DCTCP Datacenter TCP
  • the intermediate device 102 preferentially reduces the transmission rate of the elephant stream when the path is slightly congested, thereby ensuring a higher number of mouse stream transmission rates, which helps to reduce the average flow.
  • Completion Time (English: Flow Completion Time, FCT).
  • the intermediate device 102 looks for a second path from the intermediate device 102 to the receiving device 106 for which the degree of congestion is not severe (the degree of congestion is non-congested or lightly congested). If the second path does not exist (ie, there is no switchable path), the intermediate device 102 lowers all window values in the acknowledgment message for acknowledging the data message sent over the first path.
  • the calculation of the specific window value uses, for example, the calculation of the cwnd value in the congestion control algorithm of DCTCP. Thus, the transmission rate of all data messages (including the elephant stream and the mouse stream) on the first path is reduced.
  • the intermediate device 102 enters a switching state, selects the second path as a new transmission path for the elephant stream, and performs a switching process on the elephant stream.
  • the intermediate device 102 sets a window value of 0 in the acknowledgement message for confirming the data packet belonging to the elephant stream in the first data packet, and starts a timer.
  • the data packet with the first identifier in the first data packet belongs to the elephant stream, and the window value in the acknowledgement packet with the second identifier in the first acknowledgement packet is set to 0 (the first acknowledgement packet has the first
  • the acknowledgment message of the two identifiers may include a plurality of acknowledgment messages, each acknowledgment message including a window value, and the window value in each acknowledgment message is set to 0).
  • the sending device 101 determines that the number of bytes that the receiving device 106 can receive is 0, and temporarily stops sending the data packet with the first identifier, so that the intermediate device 102 performs the switch. deal with.
  • the sending device By confirming that the window value of the packet is 0, the sending device temporarily stops sending data packets and creates a switchover opportunity.
  • the data message being transmitted on the first path has time to reach the receiving device 106 first, and then the intermediate device 102 sends another data packet with the first identifier to the receiving device 106 via the second path (
  • the other data message having the first identifier may include a plurality of data messages.
  • the intermediate device 102 receives the second acknowledgment message sent by the receiving device 106, and the second acknowledgment message indicates that the first data packet has the first identifier in the first data packet (assuming that it belongs to the elephant)
  • the stream has been received by the receiving device 106, and the intermediate device 102 directly performs the switching process. This helps reduce waiting time.
  • the switching process described above includes transmitting, to the sending device 101, an acknowledgment message having a second identifier, where the window value in the acknowledgment message is restored to a window value before entering the switch state (the acknowledgment message having the second identifier may be A plurality of acknowledgement messages are included, each acknowledgement message includes a window value, and the window value in each acknowledgement message is restored to the window value before entering the switch state).
  • the sending device 101 After receiving the acknowledgement message with the window identifier of the second identifier, the sending device 101 restarts sending another data packet with the first identifier to the intermediate device 102.
  • the data packet with the first identifier may include the data packet.
  • the intermediate device 102 sends the another data packet with the first identifier to the receiving device 106 via the second path. Thereby, the elephant stream is switched to the second path for transmission.
  • the intermediate device 102 If, before the timer expires, the intermediate device 102 does not receive an acknowledgment message indicating that all data packets having the first identifier in the first data message (assuming belonging to the elephant stream) have been received by the receiving device 106, then at the timing After the timer expires, the intermediate device 102 performs a switching process which is the same as the switching process performed before the timer expires.
  • the sending device 101 sends a second data packet to the intermediate device 102.
  • the sending device 101 acquires the window value in the first acknowledgement message sent by the intermediate device 102, adjusts the sending rate according to the window value, and sends the second data packet to the intermediate device 102. Adjusting the transmission rate is, for example, adjusting the transmission window value according to the window value in the first confirmation message.
  • the sending device in the TCP protocol can dynamically adjust the sending window size according to the window value in the ACK packet returned by the receiving device.
  • the execution operation of the transmitting device 101 during the switching process can refer to the description in S205.
  • the intermediate device 102 transmits the data message having the first identifier (assuming belonging to the elephant stream) in the data message from the transmitting device 101 to the receiving device 106 via the second path. After the switching process, the intermediate device 102 can still send the data packet with other identifiers in the data packet from the sending device 101 to the receiving device 106 via the first path.
  • the transmitting device 101 only needs to send a data packet to the intermediate device 102, and adjusts the sending rate according to the window value in the acknowledgment message sent by the intermediate device 102 (ie, adjusts the sending window size).
  • the processing on the above sending device 101 follows the existing TCP protocol, and no technical changes are required, so that the method has good compatibility and convenience.
  • FIG. 3 is a structural diagram of a network device 300 according to an embodiment of the present application.
  • Network device 300 may be intermediate device 102 shown in FIG. 1, performing the operations performed by intermediate device 102 in the method illustrated in FIG. 2.
  • the network device 300 includes a receiving unit 301, a transmitting unit 302, and a processing unit 303.
  • the receiving unit 301 is configured to receive the first data packet sent by the sending device 101, and is further configured to receive the first acknowledgement packet sent by the receiving device 106 for confirming the first data packet.
  • the sending unit 302 is configured to send the first data packet to the receiving device 106 via the first path (including, for example, the intermediate devices 103, 104, and 105).
  • the processing unit 303 is configured to determine, according to the congestion flag in the first acknowledgement packet, a congestion degree of the first path, and according to the congestion degree of the first path and the first data packet, The number of bytes of the identified data packet, and the window value in the first acknowledgement packet is modified.
  • the first identifier includes a source IP address, a source port, a destination IP address, and a destination port (eg, an IP address of the transmitting device 101, a port on the transmitting device 101, an IP address of the receiving device 106, and a port on the receiving device 106).
  • the sending unit 302 is further configured to send the modified first acknowledgement message to the sending device 101.
  • the window value is used to inform the transmitting device 101 of the number of bytes that the device 106 can receive.
  • the processing unit 303 can modify the window value in the first confirmation message to include the following methods:
  • the processing unit 303 decreases the first acknowledgement packet.
  • the window value in the confirmation message with the second identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the first identifier and the second identifier have the aforementioned correspondence relationship.
  • the processing unit 303 decreases all the acknowledgements in the first acknowledgement message. The window value in the message.
  • the processing unit 303 sets a window value of 0 for the acknowledgment message having the second identifier in the first acknowledgment message, and starts a timer.
  • the receiving unit 301 receives the second acknowledgment message sent by the receiving device 106, and the second acknowledgment message indicates that all the first data packets have the first identifier
  • the data packet has been received by the receiving device 106, and the sending unit 302 sends the data packet with the first identifier in the third data packet received by the receiving unit 301 from the sending device 101 to the receiving device 106 via the second path. .
  • the sending unit 302 sends, by the second path, the receiving device 301, to the receiving device 301, the data packet having the first identifier in the second data packet received by the sending device 101.
  • the above receiving unit 301 and transmitting unit 302 can be integrated into one unit, and the functions of the above receiving unit 301 and transmitting unit 302 are completed by the unit.
  • FIG. 4 is a structural diagram of a network device 400 according to an embodiment of the present application.
  • Network device 400 may be intermediate device 102 shown in FIG. 1, performing the operations performed by intermediate device 102 in the method illustrated in FIG. 2.
  • Network device 400 includes a receiver 401, a transmitter 402, and a processor 403.
  • the receiver 401 is configured to receive the first data packet sent by the sending device 101, and is further configured to receive a first acknowledgement packet sent by the receiving device 106 for confirming the first data packet.
  • the transmitter 402 is configured to send the first data message to the receiving device 106 via the first path (including, for example, the intermediate devices 103, 104, and 105).
  • the processor 403 is configured to determine, according to the congestion flag in the first acknowledgement packet, a congestion degree of the first path, and according to a congestion degree of the first path, and a first one in the first data packet The number of bytes of the identified data packet, and the window value in the first acknowledgement packet is modified.
  • the first identifier includes a source IP address, a source port, a destination IP address, and a destination port (eg, an IP address of the transmitting device 101, a port on the transmitting device 101, an IP address of the receiving device 106, and a port on the receiving device 106).
  • the transmitter 402 is further configured to send the modified first acknowledgement message to the sending device 101.
  • the window value is used to inform the transmitting device 101 of the number of bytes that the device 106 can receive.
  • the processor 403 can modify the window value in the first confirmation message to include the following methods:
  • the processor 403 decreases the first acknowledgement packet.
  • the window value in the confirmation message with the second identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the first identifier and the second identifier have the aforementioned correspondence relationship.
  • the processor 403 decreases all the acknowledgements in the first acknowledgement message. The window value in the message.
  • the processor 403 sets a window value of 0 for the acknowledgment message with the second identifier in the first acknowledgment message, and starts a timer.
  • the receiver 401 Before the timer expires, if the receiver 401 receives the second acknowledgment message sent by the receiving device 106, and the second acknowledgment message indicates that all the first data packets have the first identifier.
  • the data packet has been received by the receiving device 106, and the transmitter 402 sends the data packet having the first identifier in the third data packet received by the receiver 401 and received by the receiver 401 via the second path to the receiving device 106. .
  • the transmitter 402 sends the data packet with the first identifier in the second data packet received by the receiver 401 from the sending device 101 to the receiving device 106 via the second path.
  • the receiver 401 and the transmitter 402 can communicate with the processor 403 via a bus or directly.
  • the receiver 401 and the transmitter 402 can be integrated into a component that performs the functions of the above receiver 401 and transmitter 402, such as a network interface.
  • the network interface is, for example, an Ethernet interface, an Asynchronous Transfer Mode (ATM) interface, or an SDH/SONET-based packet encapsulation (English: Packet over SONET/SDH, POS) interface.
  • the processor 403 includes, but is not limited to, a central processing unit (CPU), a network processor (English: Network Processor, NP), an application-specific integrated circuit (ASIC), or programmable logic.
  • CPU central processing unit
  • NP Network Processor
  • ASIC application-specific integrated circuit
  • PLD Programmable Logic Device
  • the above PLD can be a Complex Programmable Logic Device (CPLD), a Field-Programmable Gate Array (FPGA), and a general array logic (English: Generic Array Logic, GAL). Or any combination thereof.
  • CPLD Complex Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • GAL Generic Array Logic
  • FIG. 5 is a schematic structural diagram of a network according to an embodiment of the present application.
  • the network includes a transmitting device 501, at least one intermediate device (such as intermediate devices 502, 503, 505, and 506 in FIG. 5) and a receiving device 504.
  • the transmitting device 501 transmits a data message to the receiving device 504 through the intermediate device.
  • the transmitting device 501 includes a processor 5011 and a network interface controller NIC 5012.
  • the network interface controller NIC 5012 may also be referred to as a network adapter (network adapter), a network interface card (English: network interface card), or a network card (network card).
  • the processor 5011 on the transmitting device 501 can perform a TCP process to establish a TCP connection with the receiving device 504 via the NIC 5012.
  • the data messages sent by the processor 5011 first arrive at the NIC 5012 and then arrive at the receiving device 504 via the first path.
  • the first path may include zero, one or more intermediate devices (eg, intermediate devices 502 and 503 in FIG. 5).
  • the acknowledgment message (for example, ACK message) sent by the receiving device 504 for confirming the data message sent by the sending device 501 can reach the NIC 5012 via the first path, and then sent to the processor 5011 by the NIC 5012.
  • the acknowledgment message may arrive at the NIC 5012 via a second path different from the first path, and then sent by the NIC 5012 to the processor 5011.
  • the second path may include zero, one or more intermediate devices (eg, intermediate devices 506 and 505 in FIG. 5).
  • the acknowledgment message for confirming the data message sent by the transmitting device 501 is sent to the processor 5011 via the NIC 5012.
  • the network device is applied to the data center.
  • the sending device 501 can be a host or a server
  • the receiving device 504 can be a host or a server.
  • the intermediate device can be a switch, a router, a virtual switch, or the like.
  • the intermediate device 502 and the intermediate device 505 may also be a top of rack (TOR) switch, and the intermediate device 503 and the intermediate device 506 may also be TOR switches.
  • TOR top of rack
  • FIG. 6 is a flowchart of a congestion control method according to an embodiment of the present application. This method can be applied to the network structure shown in FIG. The method comprises the following steps:
  • the network interface controller 5012 receives the first data packet sent by the processor 5011, and sends the first data packet to the receiving device 504 via the first path.
  • the first path may include zero, one or more intermediate devices, such as intermediate devices 502 and 503 shown in FIG.
  • the network interface controller NIC 5012 selects the transmission path of the first data packet, and uses the source routing technology to carry the information of the path to be sent in the sent data packet. .
  • the information of the path is, for example, an IP address of each intermediate device included in the first path through which the first data packet passes.
  • the NIC 5012 adds the IP address of the intermediate device (eg, the IP addresses of the intermediate devices 502 and 503) to be passed in the option field of the IP packet header of the first data message. Therefore, each intermediate device on the first path can forward the data packet according to the IP address in the option field of the IP packet header.
  • the first data packet may include multiple data packets, and the multiple data packets may have different identifiers.
  • the plurality of data messages can be distinguished by identification.
  • the identifier includes, for example, a source IP address, a source port, a destination IP address, and a destination port.
  • the identification may, for example, also include a transport layer protocol number (the transport layer protocol number of TCP is 6), whereby the identity constitutes a five-tuple.
  • the NIC 5012 may record the correspondence between the identifier of each data packet in the first data packet and the first path.
  • the first identifier of the data packet having the first identifier in the first data packet received from the processor 5011 is recorded (for example, including the source IP address of the sending device 501, the source port on the sending device 501, and the destination of the receiving device 504).
  • the IP address and the destination port on the receiving device 504) and the corresponding information of the first path eg, including the IP addresses of the intermediate devices 502 and 503).
  • the other identifier of the data packet with the other identifier in the first data packet may also be recorded (for example, including the source IP address of the sending device 501, another source port on the sending device 501, the destination IP address of the receiving device 504, and Another destination port on the receiving device 504) and corresponding information of the first path (eg, including the IP addresses of the intermediate devices 502 and 503).
  • the NIC 5012 can record the correspondence between the identifiers of all the data packets sent through the first path and the first path.
  • the data packet sent by the first path may further include a data packet sent by the sending device 501 to the receiving device other than the receiving device 504 (that is, the receiving end is not the receiving device 504, but the data packet is sent through the first path.
  • the NIC 5012 can also record the identifier of the data message sent to the receiving device 504 via other paths and the information of the corresponding path.
  • the NIC 5012 records the correspondence between the identifier of the data packet and the path, and the path corresponding to the identifier can be determined according to the identifier of the data packet (for example, the first path is determined according to the first identifier).
  • the intermediate device on the first path sets a congestion flag in the first data packet.
  • each intermediate device on the first path can use the queue for saving the first data packet on the current device.
  • the length determines the degree of congestion and chooses whether to update the congestion flag according to the degree of congestion.
  • the queue length is, for example, the length of the data packet to be transmitted buffered in the transmission buffer on each intermediate device.
  • the method for determining the degree of congestion and updating the congestion flag can be referred to in S202, and details are not described herein again.
  • intermediate devices on other paths can also set congestion flags for data packets sent over other paths in the above manner.
  • the NIC 5012 may also determine the congestion level according to the queue length of the current NIC 5012 for storing the first data packet, and update the congestion flag in the first data packet by using the congestion flag corresponding to the congestion level.
  • the receiving device 504 sends a first acknowledgement message to the network interface controller 5012.
  • the receiving device 504 After receiving the first data packet, the receiving device 504 obtains the congestion flag in the first data packet, and generates a first acknowledgement packet.
  • the content of the generation mode and the first confirmation message refer to the description in S203, and details are not described herein again.
  • the first acknowledgement packet may arrive at the NIC 5012 through the same path (ie, the first path) that the first data packet is sent through.
  • the first acknowledgement message may also arrive at the NIC 5012 through a second path that is different from the first path.
  • the second path may include zero, one or more intermediate devices, such as intermediate devices 506 and 505 shown in FIG.
  • the data message sent by the NIC 5012 to the receiving device 504 via other paths also has a corresponding acknowledgement message sent by the receiving device 504 to the NIC 5012.
  • the identifier of the data packet sent by the other path and the identifier of the corresponding acknowledgement packet also have the corresponding relationship described in S203 above.
  • the acknowledgment message sent by the receiving device 504 may also use the method of using the congestion flag in the first acknowledgment message in S203, and passing through the data packet.
  • the confirmation message is returned by the same path or different paths.
  • the NIC 5012 determines a congestion degree of the first path according to the congestion flag in the first acknowledgement message.
  • the NIC 5012 may determine the first identifier according to the second identifier in the first acknowledgement packet, and further determine the information of the first path according to the first identifier. Further, the NIC 5012 can calculate the congestion level according to the number or proportion of the acknowledgement packets indicating the congestion level in the first acknowledgement message. For the manner of determining the congestion degree of the first path, refer to the description in S204, and details are not described herein again.
  • the NIC 5012 may use the above process to determine the congestion level of each path according to the congestion flag in the acknowledgment message. recording.
  • the NIC 5012 can record the identity of the data message, the information of the path used, and the degree of path congestion.
  • the NIC 5012 modifies the window value in the first acknowledgement packet according to the congestion degree of the first path and the number of bytes of the data packet with the first identifier in the first data packet, and the modified The first acknowledgement message is sent to the processor 5011.
  • the NIC 5012 modifies the window value in the first acknowledgement packet according to the congestion degree of the first path and the number of bytes of the data packet having the first identifier in the first data packet.
  • the NIC 5012 can separately count the number of bytes of the data packet with each identifier in the data packet, so as to determine different according to the congestion degree of the path and the number of bytes of the data packet with each identifier.
  • the window value For example, when the first data packet is sent by the NIC 5012 in step S601, the number of bytes of the data packet with each identifier in the first data packet is counted.
  • the NIC 5012 compares the counted number of bytes with a threshold f1 to distinguish whether the data message having the respective identification belongs to an elephant stream or a mouse stream. For the manner in which the NIC 5012 distinguishes whether the data packet belongs to the elephant stream or the mouse stream, refer to the description in S205, and details are not described herein again.
  • the NIC 5012 can adjust the transmission rate of the elephant stream to the mouse stream according to the different congestion levels of the first path. This adjustment is achieved by modifying the window value of the confirmation message used to confirm the data message in the elephant stream or the mouse stream. For example, you can modify the window value as follows:
  • the window value in the acknowledgement message with the second identifier in the first acknowledgement message is decreased.
  • the NIC 5012 looks for a second path from the NIC 5012 to the receiving device 504 for which the degree of congestion is not severe (the degree of congestion is non-congested or lightly congested). If the second path does not exist (ie, there is no switchable path), the NIC 5012 lowers all window values in the acknowledgment message for acknowledging the data message sent over the first path.
  • the calculation of the specific window value uses, for example, the calculation of the cwnd value in the congestion control algorithm of DCTCP.
  • the NIC 5012 enters a switching state, selects the second path as a new transmission path for the elephant stream, and performs a switching process on the elephant stream.
  • the NIC 5012 sets a window value of 0 in the acknowledgment message for confirming the elephant stream in the first data message, and starts a timer. For example, the data packet with the first identifier in the first data packet belongs to the elephant stream, and the window value in the acknowledgement packet with the second identifier in the first acknowledgement packet is set to 0 (refer to the description in S205) .
  • the processor 5011 After receiving the acknowledgment message with the window value of 0, the processor 5011 determines that the number of bytes that the receiving device 504 can receive is 0, and temporarily stops sending the data packet with the first identifier, so that the NIC 5012 performs the switching process. If the timer expires, the NIC 5012 receives the second acknowledgment message sent by the receiving device 504, and the second acknowledgment message indicates that the first data packet has the first identifier in the first data packet (assuming it belongs to the elephant stream). Has been received by the receiving device 504, the NIC 5012 directly performs the switching process.
  • the routing process includes the acknowledgment message with the second identifier sent by the NIC 5012 to the processor 5011, and the window value in the acknowledgment message is restored to the window value before entering the switch state (refer to the description in S205 for details).
  • the processor 5011 restarts sending another data packet with the first identifier to the NIC 5012.
  • the data packet with the first identifier may include multiple The data packet is sent by the NIC 5012 to the receiving device 504 via the second path. Thereby, the elephant stream is switched to the second path for transmission.
  • the NIC 5012 If the NIC 5012 does not receive an acknowledgment message indicating that all data packets having the first identifier in the first data packet (assuming belonging to the elephant stream) have been received by the receiving device 504 before the timer expires, the timer expires. Thereafter, the NIC 5012 performs a switching process which is the same as the switching process performed before the timer expires.
  • the processor 5011 sends a second data packet to the network interface controller 5012.
  • the processor 5011 obtains a window value in the first acknowledgement message sent by the network interface controller NIC 5012, adjusts the transmission rate according to the window value, and sends a second data packet to the NIC 5012. Adjusting the transmission rate is, for example, adjusting the transmission window value according to the window value in the first confirmation message.
  • Adjusting the transmission rate is, for example, adjusting the transmission window value according to the window value in the first confirmation message.
  • the sending device in the TCP protocol can dynamically adjust the sending window size according to the window value in the ACK packet returned by the receiving device.
  • the execution operation of the processor 5011 during the switching process can refer to the description in S605.
  • the NIC 5012 transmits the data message having the first identification (assuming belonging to the elephant stream) in the data message from the processor 5011 to the receiving device 504 via the second path. After the switching process, the NIC 5012 can still send the data packet with other identifiers in the data packet from the processor 5011 to the receiving device 504 via the first path.
  • the processor 5011 only needs to send a data packet to the NIC 5012, and adjusts the transmission rate according to the window value in the acknowledgement message sent by the NIC 5012 (ie, adjusts the transmission window size).
  • the processing on the above processor 5011 follows the existing TCP protocol, and no technical modification is required.
  • the network device controller NIC 5012 having the above functions can be implemented on the transmitting device 501 to implement the method. Thus the method has better compatibility and convenience.
  • FIG. 7 is a structural diagram of a network interface controller 700 according to an embodiment of the present application.
  • the network interface controller 700 (i.e., NIC 700) may be the network interface controller NIC 5012 shown in FIG. 5, performing the operations performed by the network interface controller NIC 5012 in the method illustrated in FIG.
  • the network interface controller 700 includes a receiving unit 701, a transmitting unit 702, and a processing unit 703.
  • the receiving unit 701 is configured to receive the first data packet sent by the processor 5011, and further configured to receive, by the receiving device 504, a first acknowledgement packet, configured to confirm the first data packet.
  • the sending unit 702 is configured to send the first data packet to the receiving device 504 via the first path (including, for example, the intermediate devices 502 and 503).
  • the processing unit 703 is configured to determine, according to the congestion flag in the first acknowledgement packet, a congestion degree of the first path, and according to a congestion degree of the first path, and a first one in the first data packet The number of bytes of the identified data packet, and the window value in the first acknowledgement packet is modified.
  • the first identifier includes a source IP address, a source port, a destination IP address, and a destination port (eg, an IP address of the transmitting device 501, a port on the transmitting device 501, an IP address of the receiving device 504, and a port on the receiving device 504).
  • the sending unit 702 is further configured to send the modified first confirmation message to the processor 5011.
  • the window value is used to notify the processor 5011 of the number of bytes that the device 504 can receive.
  • the processing unit 703 can modify the window value in the first confirmation message to include the following methods:
  • the processing unit 703 decreases the first acknowledgement packet.
  • the window value in the confirmation message with the second identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the first identifier and the second identifier have the aforementioned correspondence relationship.
  • the processing unit 703 decreases all the acknowledgement messages in the first acknowledgement message.
  • the processing unit 703 sets a window value of the acknowledgement message with the second identifier in the first acknowledgement message to 0, and starts a timer.
  • the receiving unit 701 receives the second acknowledgment message sent by the receiving device 504, and the second acknowledgment message indicates that all the first data packets have the first identifier.
  • the data packet has been received by the receiving device 504, and the sending unit 702 sends the data packet with the first identifier in the third data packet received by the receiving unit 701 from the processor 5011 to the receiving device 504 via the second path. .
  • the sending unit 702 sends the data packet with the first identifier in the second data packet received by the receiving unit 701 from the processor 5011 to the receiving device 504 via the second path.
  • the above receiving unit 701 and transmitting unit 702 can be integrated into one unit, and the functions of the above receiving unit 701 and transmitting unit 702 are completed by the unit.
  • FIG. 8 is a structural diagram of a network interface controller 800 according to an embodiment of the present application.
  • the network interface controller 800 (i.e., NIC 800) may be the network interface controller NIC 5012 shown in FIG. 5, performing the operations performed by the network interface controller NIC 5012 in the method illustrated in FIG.
  • the network interface controller 800 includes a receiver 801, a transmitter 802, and a processor 803.
  • the receiver 801 is configured to receive the first data packet sent by the processor 5011, and further configured to receive, by the receiving device 504, a first acknowledgement message, configured to confirm the first data packet.
  • the transmitter 802 is configured to send the first data packet to the receiving device 504 via the first path (including, for example, the intermediate devices 502 and 503).
  • the processor 803 is configured to determine, according to the congestion flag in the first acknowledgement packet, a congestion degree of the first path, and according to the congestion degree of the first path and the first data packet, The number of bytes of the identified data packet, and the window value in the first acknowledgement packet is modified.
  • the first identifier includes a source IP address, a source port, a destination IP address, and a destination port (e.g., an IP address of the transmitting device 501, a port on the transmitting device 501, an IP address of the receiving device 504, and a port on the receiving device 504).
  • the transmitter 802 is further configured to send the modified first acknowledgement message to the processor 5011.
  • the window value is used to notify the processor 5011 of the number of bytes that the device 504 can receive.
  • the processor 803 modifying the window value in the first acknowledgement message may include the following methods:
  • the processor 803 decreases the first acknowledgement packet.
  • the window value in the confirmation message with the second identifier includes a source IP address, a source port, a destination IP address, and a destination port.
  • the first identifier and the second identifier have the aforementioned correspondence relationship.
  • the processor 803 decreases all the acknowledgement messages in the first acknowledgement message.
  • the processor 803 sets a window value of the acknowledgment message with the second identifier in the first acknowledgment message to 0, and starts a timer.
  • the receiver 801 Before the timer expires, if the receiver 801 receives the second acknowledgment message sent by the receiving device 504, and the second acknowledgment message indicates that all the first data packets have the first identifier.
  • the data packet has been received by the receiving device 504, and the transmitter 802 sends the data packet with the first identifier in the third data packet received by the receiver 801 from the processor 5011 to the receiving device 504 via the second path. .
  • the transmitter 802 sends the data packet with the first identifier in the second data packet received by the receiver 801 from the processor 5011 to the receiving device 504 via the second path.
  • the receiver 801 and the transmitter 802 can communicate with the processor 803 via a bus or directly.
  • the bus is, for example, a Peripheral Component Interconnect Express (PCI-E) bus.
  • PCI-E Peripheral Component Interconnect Express
  • the receiver 801 and the transmitter 802 can be integrated into a component that performs the functions of the above receiver 801 and transmitter 802, such as a network interface.
  • the network interface is, for example, an Ethernet interface, an Asynchronous Transfer Mode (ATM) interface, or an SDH/SONET-based packet encapsulation (English: Packet over SONET/SDH, POS) interface.
  • the processor 803 includes, but is not limited to, a central processing unit (CPU), a network processor (English: Network Processor, NP), an application-specific integrated circuit (ASIC), or programmable logic.
  • CPU central processing unit
  • NP Network Processor
  • ASIC application-specific integrated circuit
  • PLD Programmable Logic Device
  • the above PLD can be a Complex Programmable Logic Device (CPLD), a Field-Programmable Gate Array (FPGA), and a general array logic (English: Generic Array Logic, GAL). Or any combination thereof.
  • CPLD Complex Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • GAL Generic Array Logic
  • embodiments of the present application can be provided as a method, apparatus (device), or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the computer program is stored/distributed in a suitable medium, provided with other hardware or as part of the hardware, or in other distributed forms, such as over the Internet or other wired or wireless telecommunication systems.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • the size of the sequence numbers of the foregoing methods does not mean the order of execution, and the order of execution of each method should be determined by its function and internal logic, and should not be applied to the embodiment of the present application.
  • the implementation process constitutes any limitation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请提供了一种拥塞控制方法、网络设备及其网络接口控制器。在一种拥塞控制方法中,第一中间设备接收发送设备发送的第一数据报文,经第一路径向接收设备发送第一数据报文;第一中间设备接收接收设备发送的用于对所述第一数据报文进行确认的第一确认报文,根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度;第一中间设备根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值,将修改后的第一确认报文发送给发送设备。本申请提供的方案,根据通信路径的拥塞程度和数据报文的字节数调整数据报文传输速度,有助于提供更为精确有效的拥塞控制,提高网络资源利用率。

Description

一种拥塞控制方法、网络设备及其网络接口控制器
本申请要求于2017年5月15日提交中国专利局、申请号为201710340116.3、申请名称为“一种拥塞控制方法、网络设备及其网络接口控制器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,尤其涉及一种拥塞控制方法、网络设备及其网络接口控制器。
背景技术
目前的数据中心中使用数据中心传输控制协议(英文:Data Center Transmission Control Protocol,DCTCP)和显示拥塞通知(英文:Explicit Congestion Notification,ECN)来进行拥塞控制。数据传输过程中,发送设备发送的数据报文包括拥塞字段。中间网络设备根据设备当前队列长度决定是否在拥塞字段中设置拥塞标记。接收设备将拥塞标记信息携带在确认(英文:acknowledgment,ACK)报文中返回给发送设备。发送设备可以根据确认报文中的拥塞标记信息确定数据报文发送中是否发生拥塞,以调整发送速率(例如发生拥塞时,降低发送窗口大小)。以上数据传输过程中使用的拥塞控制较为简单,不能提供更为精确有效的拥塞控制。
发明内容
本申请提供了一种拥塞控制方法、网络设备及其网络接口控制器,有助于提供更为精确有效的拥塞控制,提高网络资源利用率。
第一方面,提供了一种拥塞控制方法。该方法应用于数据传输网络中,所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备。所述发送设备通过所述中间设备向所述接收设备发送数据报文。所述方法包括:
所述至少一个中间设备中的第一中间设备接收所述发送设备发送的第一数据报文,经第一路径向所述接收设备发送第一数据报文;
所述第一中间设备接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文,根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度;
所述第一中间设备根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值,将修改后的第一确认报文发送给所述发送设备。所述窗口值用于通知所述发送设备所述接收设备能够接收的字节数。所述第一标识包括源IP地址、源端口、目的IP地址和目的端口。
以上方案中,第一中间设备修改返回给发送设备的确认报文中的窗口值,由此使得发送设备可以根据确认报文中的窗口值调整发送速率。该窗口值根据路径的拥塞程 度和数据报文的字节数进行不同的修改,有助于实现更为精确有效的拥塞控制,提高网络资源利用率。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值(即具有第一标识的数据报文属于大象流),则所述第一中间设备降低所述第一确认报文中具有第二标识的确认报文中的窗口值。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。由此,在路径拥塞程度为轻度时,降低大象流速率,保障老鼠流的速率。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且不存在从所述第一中间设备到所述接收设备的拥塞程度为轻度或不拥塞的路径(即不存在从所述第一中间设备到所述接收设备的其他路径,或是从所述第一中间设备到所述接收设备的所有路径的拥塞程度均为重度),则所述第一中间设备降低所述第一确认报文中所有确认报文中的窗口值。由此当不存在可换路径时,发送设备降低第一路径上所有数据报文(包括大象流和老鼠流)的发送速率。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值(即具有第一标识的数据报文属于大象流),且存在从所述第一中间设备到所述接收设备的拥塞程度为轻度或不拥塞的第二路径(即存在可换路径),则所述第一中间设备设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口。所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。所述方法还包括:在所述定时器超时后,所述第一中间设备经所述第二路径向所述接收设备发送接收自所述发送设备的第二数据报文中具有所述第一标识的数据报文。通过设置确认报文的窗口值为0的方式,创造换路时机。通过设置定时器,使得路径上正在传输的报文有时间到达接收设备,降低换路导致报文乱序的可能。
可选地,在所述定时器超时前,如果所述第一中间设备收到所述接收设备发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被所述接收设备接收,则所述第一中间设备经所述第二路径向所述接收设备发送接收自所述发送设备的第三数据报文中具有所述第一标识的数据报文。如果在定时器超时前,发送设备向接收设备发送的所有数据报文均已到达接收设备(此时换路不会导致报文乱序),则第一中间设备直接执行换路处理,有助于减少等待时间。
第二方面,提供了一种拥塞控制方法。该方法应用于数据传输网络中。所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备。所述发送设备包括处理器和网络接口控制器(英文:network interface controller,NIC)。所述发送设备通过所述中间设备向所述接收设备发送数据报文。所述方法包括:
所述网络接口控制器NIC接收所述处理器发送的第一数据报文,经第一路径向所述接收设备发送第一数据报文;
所述NIC接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文,根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度;
所述NIC根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值,将修改后的第一确认报文发送给所述处理器。所述窗口值用于通知所述处理器所述接收设备能够接收的字节数。所述第一标识包括源IP地址、源端口、目的IP地址和目的端口。
以上方案中,所述NIC修改返回给处理器的确认报文中的窗口值,由此使得处理器可以根据确认报文中的窗口值调整发送速率。该窗口值根据路径的拥塞程度和数据报文的字节数进行不同的修改,有助于实现更为精确有效的拥塞控制。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述NIC降低所述第一确认报文中具有第二标识的确认报文中的窗口值。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址、目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址、源端口相同。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且不存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的路径,则所述NIC降低所述第一确认报文中所有确认报文中的窗口值。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的第二路径,则所述NIC设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。所述方法还包括:在所述定时器超时后,所述NIC经所述第二路径向所述接收设备发送接收自所述处理器的第二数据报文中具有所述第一标识的数据报文。
可选地,在所述定时器超时前,如果所述NIC收到所述接收设备发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被所述接收设备接收,则所述NIC经所述第二路径向所述接收设备发送接收自所述处理器的第三数据报文中具有所述第一标识的数据报文。
第三方面,提供了一种网络设备。该网络设备应用于数据传输网络中。所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备。所述网络设备为所述至少一个中间设备中的第一中间设备。所述发送设备通过所述中间设备向所述接收设备发送数据报文。所述网络设备包括接收单元、发送单元和处理单元。
接收单元,用于接收所述发送设备发送的第一数据报文,还用于接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文。
发送单元,用于经第一路径向所述接收设备发送所述第一数据报文。
处理单元,用于根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程 度,并根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值。所述第一标识包括源IP地址、源端口、目的IP地址和目的端口。
所述发送单元,还用于将修改后的第一确认报文发送给所述发送设备。所述窗口值用于通知所述发送设备所述接收设备能够接收的字节数。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述处理单元降低所述第一确认报文中具有第二标识的确认报文中的窗口值。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口。所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且不存在从所述网络设备到所述接收设备的拥塞程度为轻度或不拥塞的路径,则所述处理单元降低所述第一确认报文中所有确认报文中的窗口值。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述网络设备到所述接收设备的拥塞程度为轻度或不拥塞的第二路径,则所述处理单元设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口。所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。所述发送单元,还用于在所述定时器超时后,经所述第二路径向所述接收设备发送所述接收单元接收自所述发送设备的第二数据报文中具有所述第一标识的数据报文。
可选地,所述发送单元还用于,在所述定时器超时前,如果所述接收单元收到所述接收设备发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被所述接收设备接收,则经所述第二路径向所述接收设备发送所述接收单元接收自所述发送设备的第三数据报文中具有所述第一标识的数据报文。
第四方面,提供了一种网络接口控制器NIC。该NIC应用于数据传输网络中。所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备。所述发送设备包括处理器和所述NIC。所述发送设备通过所述中间设备向所述接收设备发送数据报文。所述NIC包括接收单元、发送单元和处理单元。
接收单元,用于接收所述处理器发送的第一数据报文,还用于接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文。
发送单元,用于经第一路径向所述接收设备发送所述第一数据报文。
处理单元,用于根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,并根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值。所述第一标识包括源IP地址、源端口、目的IP地址和目的端口。
所述发送单元,还用于将修改后的第一确认报文发送给所述处理器。所述窗口值用于通知所述处理器所述接收设备能够接收的字节数。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述处理单元降低所述第一确认报文中具有第二标识的确认报文中的窗口值。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口。所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且不存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的路径,则所述处理单元降低所述第一确认报文中所有确认报文中的窗口值。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的第二路径,则所述处理单元设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口。所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。所述发送单元,还用于在所述定时器超时后,经所述第二路径向所述接收设备发送所述接收单元接收自所述处理器的第二数据报文中具有所述第一标识的数据报文。
可选地,所述发送单元,还用于在所述定时器超时前,如果所述接收单元收到所述接收设备发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被所述接收设备接收,则经所述第二路径向所述接收设备发送所述接收单元接收自所述处理器的第三数据报文中具有所述第一标识的数据报文。
第五方面,提供了一种网络设备。该网络设备应用于数据传输网络中。所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备。所述网络设备为所述至少一个中间设备中的第一中间设备。所述发送设备通过所述中间设备向所述接收设备发送数据报文。所述网络设备包括接收器、发送器和处理器。
接收器,用于接收所述发送设备发送的第一数据报文,还用于接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文。
发送器,用于经第一路径向所述接收设备发送所述第一数据报文。
处理器,用于根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,并根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值。所述第一标识包括源IP地址、源端口、目的IP地址和目的端口。
所述发送器,还用于将修改后的第一确认报文发送给所述发送设备。所述窗口值用于通知所述发送设备所述接收设备能够接收的字节数。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞 程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述处理器降低所述第一确认报文中具有第二标识的确认报文中的窗口值。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口。所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且不存在从所述网络设备到所述接收设备的拥塞程度为轻度或不拥塞的路径,则所述处理器降低所述第一确认报文中所有确认报文中的窗口值。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述网络设备到所述接收设备的拥塞程度为轻度或不拥塞的第二路径,则所述处理器设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口。所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。所述发送器,还用于在所述定时器超时后,经所述第二路径向所述接收设备发送所述接收单元接收自所述发送设备的第二数据报文中具有所述第一标识的数据报文。
可选地,所述发送器还用于,在所述定时器超时前,如果所述接收器收到所述接收设备发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被所述接收设备接收,则经所述第二路径向所述接收设备发送所述接收器接收自所述发送设备的第三数据报文中具有所述第一标识的数据报文。
第六方面,提供了一种网络接口控制器NIC。该NIC应用于数据传输网络中。所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备。所述发送设备包括第一处理器和所述NIC。所述发送设备通过所述中间设备向所述接收设备发送数据报文。所述NIC包括接收器、发送器和第二处理器。
接收器,用于接收所述第一处理器发送的第一数据报文,还用于接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文。
发送器,用于经第一路径向所述接收设备发送所述第一数据报文;
第二处理器,用于根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,并根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值。所述第一标识包括源IP地址、源端口、目的IP地址和目的端口。
所述发送器,还用于将修改后的第一确认报文发送给所述第一处理器。所述窗口值用于通知所述第一处理器所述接收设备能够接收的字节数。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述第二处理器降低所述第一确认报文中具有第二标识的确认报文中的窗口值。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口。所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、 目的端口、源IP地址和源端口相同。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且不存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的路径,则所述第二处理器降低所述第一确认报文中所有确认报文中的窗口值。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的第二路径,则所述第二处理器设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口。所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。所述发送器,还用于在所述定时器超时后,经所述第二路径向所述接收设备发送所述接收器接收自所述第一处理器的第二数据报文中具有所述第一标识的数据报文。
可选地,所述发送器,还用于在所述定时器超时前,如果所述接收器收到所述接收设备发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被所述接收设备接收,则经所述第二路径向所述接收设备发送所述接收器接收自所述第一处理器的第三数据报文中具有所述第一标识的数据报文。
第七方面,提供了一种网络设备。该网络设备应用于数据传输网络中。所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备。所述网络设备为所述发送设备。所述发送设备通过所述中间设备向所述接收设备发送数据报文。所述发送设备包括:处理器和网络接口控制器NIC。
所述处理器,用于向所述NIC发送第一数据报文。
所述NIC,用于接收所述处理器发送的第一数据报文,经第一路径向所述接收设备发送所述第一数据报文,还用于接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文,根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数修改所述第一确认报文中的窗口值,并将修改后的第一确认报文发送给所述处理器。所述窗口值用于通知所述处理器所述接收设备能够接收的字节数。所述第一标识包括源IP地址、源端口、目的IP地址和目的端口。
所述处理器进一步用于接收所述NIC发送的所述修改后的第一确认报文,根据所述修改后的第一确认报文通过所述NIC向所述接收设备发送第二数据报文。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述NIC降低所述第一确认报文中具有第二标识的确认报文中的窗口值。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口。所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞 程度为重度,且不存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的路径,则所述NIC降低所述第一确认报文中所有确认报文中的窗口值。
可选地,以上修改所述第一确认报文中的窗口值包括:如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的第二路径,则所述NIC设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器。所述第二标识包括源IP地址、源端口、目的IP地址和目的端口。所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。所述NIC,还用于在所述定时器超时后,经所述第二路径向所述接收设备发送所述NIC接收自所述处理器的第二数据报文中具有所述第一标识的数据报文。
可选地,所述NIC,还用于在所述定时器超时前,如果所述NIC收到所述接收设备发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被所述接收设备接收,则经所述第二路径向所述接收设备发送所述NIC接收自所述处理器的第三数据报文中具有所述第一标识的数据报文。
第八方面,提供了一种计算机可读存储介质。所述计算机可读存储介质中存储有指令,当该指令在计算机上运行时,使得计算机执行上述各方面所述的方法。
第九方面,提供了一种包含指令的计算机程序产品。当该计算机程序产品在计算机上运行时,使得计算机执行上述各方面所述的方法。
附图说明
图1为本申请实施例提供的一种网络结构示意图;
图2为本申请实施例提供的一种拥塞控制方法的流程图;
图3为本申请实施例提供的一种网络设备300的结构图;
图4为本申请实施例提供的一种网络设备400的结构图;
图5为本申请实施例提供的一种网络结构示意图;
图6为本申请实施例提供的一种拥塞控制方法的流程图;
图7为本申请实施例提供的一种网络接口控制器700的结构图;
图8为本申请实施例提供的一种网络接口控制器800的结构图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式做出进一步地详细描述。
本申请实施例描述的应用场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着网络架构的演变和新应用场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
图1为本申请实施例提供的一种网络结构示意图。该网络包括发送设备101、至少一个中间设备(例如图1中的中间设备102、103、104、105、107和108)和接收 设备106。发送设备101通过所述中间设备向接收设备106发送数据报文。发送设备101可以与接收设备106建立传输控制协议(英文:Transmission Control Protocol,TCP)连接。发送设备101发送的数据报文均到达中间设备102,随后可以经由第一路径到达接收设备106。中间设备102可以是发送设备101向接收设备106发送的数据报文到达的第一跳网络设备或是第N跳网络设备,N为大于1的整数。该第一路径可以包括0个、1个或多个中间设备(例如图1中的中间设备103、104和105)。接收设备106发送的用于对发送设备101发送的数据报文进行确认的确认报文(例如ACK报文)可以经由该第一路径到达中间设备102,再由中间设备102发送给发送设备101。或者该确认报文可以经由与第一路径不同的第二路径到达中间设备102,再由中间设备102发送给发送设备101。该第二路径可以包括0个、1个或多个中间设备(例如图1中的中间设备105、108和107)。用于对发送设备101发送的数据报文进行确认的确认报文均经过中间设备102到达发送设备101。以图1网络结构应用于数据中心为例,发送设备101可以是主机或服务器,接收设备106可以是主机或服务器,以上所述中间设备可以为交换机、路由器、虚拟交换机等。中间设备102还可以是柜顶(英文:top of rack,TOR)交换机,中间设备105也可以是TOR交换机。
图2为本申请实施例提供的一种拥塞控制方法的流程图。该方法可应用于图1所示网络结构。该方法包括如下步骤:
S201、中间设备102接收发送设备101发送的第一数据报文,经第一路径向接收设备106发送所述第一数据报文。
第一路径可以包括0个、1个或多个中间设备,例如图1所示中间设备103、104和105。
中间设备102接收到第一数据报文后,选择第一数据报文的传输路径,并采用源路由(英文:source routing)技术,在发送的数据报文中携带要经过的路径的信息。该路径的信息例如是第一数据报文要经过的第一路径包括的各个中间设备的IP地址。举例来说,中间设备102在第一数据报文的互联网协议(英文:Internet Protocol,IP)报文头部的选项(英文:options)字段中添加要经过的中间设备的IP地址(例如中间设备103、104和105的IP地址)。由此第一路径上的各中间设备可以根据IP报文头部的选项字段中的IP地址对数据报文进行转发。源路由技术具体实现可以参照国际互联网工程任务组(英文:The Internet Engineering Task Force,IETF)RFC 791(INTERNET PROTOCOL,DARPA INTERNET PROGRAM PROTOCOL SPECIFICATION),该文件的全部内容以引入的方式并入本文中。
第一数据报文中可以包括多个数据报文,该多个数据报文可以具有不同的标识。该多个数据报文可以通过标识进行区分。该标识例如包括源IP地址、源端口、目的IP地址和目的端口。该标识例如还可以包括传输层协议号(TCP的传输层协议号为6),由此该标识构成一个五元组。中间设备102在为第一数据报文选择第一路径后,可以记录第一数据报文中每个数据报文的标识和该第一路径之间的对应关系。例如记录第一数据报文中具有第一标识的数据报文的第一标识(例如包括发送设备101的源IP地址、发送设备101上的源端口、接收设备106的目的IP地址和接收设备106上的目的端口)以及对应的该第一路径的信息(例如包括中间设备103、104和105的IP地 址)。例如还可以记录第一数据报文中具有其他标识的数据报文的该其他标识(例如发送设备101的源IP地址、发送设备101上的另一源端口、接收设备106的目的IP地址和接收设备106上的另一目的端口)以及对应的该第一路径的信息(例如包括中间设备103、104和105的IP地址)。中间设备102可以将所有经过第一路径发送的数据报文的标识与该第一路径的对应关系进行记录。该经过第一路径发送的数据报文还可以包括不是从发送设备101到接收设备106的数据报文(即发送端不是发送设备101或接收端不是接收设备106,但数据报文经该第一路径发送,图1未示出)。同样地,中间设备102还可以记录经其他路径向接收设备106发送的数据报文的标识及对应的路径的信息。中间设备102记录了数据报文的标识与路径的对应关系,由此可以根据数据报文的标识确定标识对应的路径(例如根据该第一标识确定该第一路径)。
S202、第一路径上的中间设备在所述第一数据报文中设置拥塞标记。
第一数据报文在经过第一路径上的各中间设备(例如中间设备103、104和105)时,第一路径上的各中间设备都可以根据当前设备上用于保存该第一数据报文的队列长度确定拥塞程度。该队列长度例如是各中间设备上发送缓存中缓存的待发送数据报文的长度。具体来说,中间设备上可以设置两个阈值(例如t1和t2,其中t1<t2),当所述队列长度小于阈值t1时,拥塞程度为不拥塞,当所述队列长度大于等于阈值t1且小于等于阈值t2时,拥塞程度为轻度,当所述队列长度大于阈值t2时,拥塞程度为重度。中间设备将拥塞程度与第一数据报文的拥塞标记所指示的拥塞程度进行比较。所述第一数据报文的拥塞标记可以包括多个拥塞标记,该多个拥塞标记中的每一个拥塞标记被包括在该第一数据报文中的一个数据报文中。当中间设备上拥塞程度大于第一数据报文的拥塞标记所指示的拥塞程度时,中间设备使用拥塞程度对应的拥塞标记来更新第一数据报文中的拥塞标记,否则不更新拥塞标记。
第一数据报文中使用IP报文头部的差分服务代码点(英文:Differentiated Services Code Point,DSCP)字段做为拥塞标记,IP报文头部的DSCP字段具体可以参照IETF RFC2474(Definition of the Differentiated Services Field(DS Field)in the IPv4 and IPv6 Headers),该文件的全部内容以引入的方式并入到本文中。DSCP字段包括4个比特,从发送设备101发往接收设备106的数据报文中,例如可以使用该4个比特中的前2个比特做为所述拥塞标记。拥塞标记为00时表示不拥塞,为01时表示轻度拥塞,为10时表示重度拥塞。
以中间设备103为例,当中间设备103确定自身设备上的拥塞程度为重度拥塞时,如果第一数据报文中的拥塞标记为00或01,则将拥塞标记更新为10,如果第一数据报文中的拥塞标记为10,则不更新拥塞标记。当中间设备103确定自身设备上的拥塞程度为轻度拥塞时,如果第一数据报文中的拥塞标记为00,则将拥塞标记更新为01,如果第一数据报文中的拥塞标记为01或10,则不更新拥塞标记。当中间设备103确定自身设备上的拥塞程度为不拥塞时,不更新拥塞标记。通过以上拥塞标记更新机制,接收设备106接收的数据报文中的拥塞标记可以指示整条路径上最重的拥塞程度。
同样地,其他路径上的中间设备也可以用以上方式对经其他路径进行发送的数据报文设置拥塞标记。此外,中间设备102也可以根据当前中间设备102上用于保存第一数据报文的队列长度确定拥塞程度,并用该拥塞程度对应的拥塞标记来更新第一数 据报文中的拥塞标记。
S203、接收设备106向中间设备102发送第一确认报文。
接收设备106接收到第一数据报文后,获取第一数据报文中的拥塞标记,例如DSCP字段的前两位,生成第一确认报文。第一确认报文用于对所述第一数据报文进行确认,例如是TCP协议中的ACK报文。第一确认报文可以包括多个确认报文,每个确认报文用于对第一数据报文中的一个数据报文进行确认,该多个确认报文可以通过标识进行区分。该标识例如包括源IP地址、源端口、目的IP地址、目的端口。该标识例如还可以包括传输层协议号(TCP的传输层协议号为6),由此该标识构成一个五元组。数据报文的标识与用于确认该数据报文的确认报文的标识的对应关系为:数据报文的标识的源IP地址、源端口、目的IP地址和目的端口分别与确认报文的标识的目的IP地址、目的端口、源IP地址和源端口相同。以上第一数据报文的标识与第一确认报文的标识即具有以上对应关系。由此,第一数据报文中具有第一标识的数据报文的第一标识可以对应到第一确认报文中具有第二标识的确认报文的第二标识。
第一确认报文中使用IP报文头部的DSCP字段的后2个比特做为拥塞标记,接收设备106发送的第一确认报文中的拥塞标记的值与接收设备106接收的第一数据报文中的拥塞标记的值相同。
第一确认报文可以通过第一数据报文发送时经过的同一路径(即第一路径)到达中间设备102。第一确认报文也可以通过不同于第一路径的第二路径到达中间设备102。第二路径可以包括0个、1个或多个中间设备,例如图1所示中间设备105、108和107。
同样地,中间设备102经其他路径向接收设备106发送的数据报文也具有对应的由接收设备106向中间设备102发送的确认报文。该经其他路径发送的数据报文的标识与对应的确认报文的标识也具有以上对应关系。同样地,对于该中间设备102经其他路径发送的数据报文,接收设备106发送的确认报文也可以采用以上在第一确认报文中使用拥塞标记的方式,并经与数据报文发送时经过的同一路径或不同路径返回该确认报文。
S204、中间设备102根据第一确认报文中的拥塞标记确定第一路径的拥塞程度。
中间设备102根据第一确认报文中的第二标识可以确定第一标识(参见S203中第一标识和第二标识的对应关系),并进一步根据第一标识确定第一路径的信息(参见S201,中间设备102可以记录第一标识以及第一路径之间的对应关系)。进而,中间设备102可以根据第一确认报文中拥塞标记指示各个拥塞程度的确认报文的数量或比例来计算拥塞程度。例如,中间设备102计算第一确认报文中的多个确认报文中拥塞标记指示重度的确认报文的比例,当比例大于阈值r2时,确定第一路径的拥塞程度为重度拥塞,当比例小于等于阈值r2大于阈值r1时(r1<r2),确定第一路径的拥塞程度为轻度拥塞,当比例小于阈值r1时,确定第一路径的拥塞程度为不拥塞。还例如,中间设备102还可以将第一确认报文中的多个确认报文中拥塞标记指示重度的确认报文和拥塞标记指示轻度的确认报文的数量之和在所述多个确认报文的总数中的占比与阈值r1和阈值r2(r1<r2)比较,当该占比大于阈值r2时,确定第一路径的拥塞程度为重度拥塞,当比例小于等于阈值r2大于阈值r1时(r1<r2),确定第一路径的拥塞程度为轻度拥塞,当比例小于阈值r1时,确定第一路径的拥塞程度为不拥塞。还例 如,中间设备102将第一确认报文中的多个确认报文中拥塞标记指示重度的确认报文的数量,与阈值n1和阈值n2(n1<n2)进行比较,当该数量大于阈值n2时,确定第一路径的拥塞程度为重度拥塞,当比例小于等于阈值n2大于阈值n1时(n1<n2),确定第一路径的拥塞程度为轻度拥塞,当比例小于阈值n1时,确定第一路径的拥塞程度为不拥塞。
同样地,中间设备102接收到接收设备106发送的用于对各个路径发送的数据报文进行确认的确认报文后,可以采用以上过程,根据确认报文中的拥塞标记确定各个路径的拥塞程度并进行记录。由此,中间设备102上可以记录数据报文的标识、使用的路径的信息和路径拥塞程度。
S205、中间设备102根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值,将修改后的第一确认报文发送给发送设备101。
TCP协议中在报文头部包括窗口值字段以便进行流量控制。接收设备向发送设备发送ACK报文,通过ACK报文中的窗口值来通知发送设备接收设备能够接收的字节数。发送设备根据ACK报文中的窗口值调整数据报文的发送速率。关于TCP协议流量控制可参照IETF RFC793(TRANSMISSION CONTROL PROTOCOL,DARPA INTERNET PROGRAM PROTOCOL SPECIFICATION),该文件的全部内容以引入的方式并入本文中。
中间设备102根据第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改第一确认报文中的窗口值。中间设备102在发送数据报文时,可以对数据报文中具有各个标识的数据报文分别进行字节数统计,以便根据路径的拥塞程度和具有各个标识的数据报文的字节数大小来确定不同的窗口值。举例来说,步骤S201中中间设备102发送第一数据报文时,可以对第一数据报文中具有第一标识的数据报文进行字节数统计。中间设备102将统计的字节数与设定的阈值f1相比较以区分具有第一标识的数据报文属于一个大象流(英文:elephant flow,即极大连续流)还是一个老鼠流(英文:mice flow,即短流)。当具有某一标识的数据报文统计的字节数大于阈值f1时,该具有某一标识的数据报文属于一个大象流,否则属于一个老鼠流。例如,中间设备102统计第一数据报文中具有所述第一标识的数据报文的字节数,当该字节数大于阈值f1时,该具有所述第一标识的数据报文属于大象流,当该字节数小于等于阈值f1时,该具有所述第一标识的数据报文属于老鼠流。中间设备102对具有各个标识的数据报文进行字节数统计采用累计的方式,根据每个标识都可以获得一个累计字节数。每个标识的字节数累计过程在中间设备102收到包括该标识以及FIN标志(参见IETF RFC793)的确认报文后结束。中间设备102根据包括FIN标志的确认报文中的标识,确定对应的数据报文的标识,将具有该对应标识的数据报文的累计字节数清零,或者将具有该对应标识的记录项删除。
中间设备102可以根据第一路径不同的拥塞程度,对大象流老鼠流的发送速率做不同调整,这有助于实现更为精确有效的拥塞控制。该调整通过修改用于对该大象流或老鼠流中的数据报文进行确认的确认报文的窗口值来实现。举例来说,可以按照如下方式修改窗口值:
如果第一路径的拥塞程度为轻度,且第一数据报文中具有第一标识的数据报文属 于大象流(即字节数大于阈值f1),减小第一确认报文中具有第二标识的确认报文中的窗口值,其中第一标识与第二标识具有前述的对应关系。第一确认报文中具有第二标识的确认报文可以包括多个确认报文,每个确认报文包括一个窗口值,每个确认报文中的窗口值都被减小。具体窗口值的计算例如采用DCTCP的拥塞控制算法中对cwnd值的计算,该计算参照IETF发布的Datacenter TCP(DCTCP):TCP Congestion Control for Datacenters,draft-ietf-tcpm-dctcp-03,该文件的全部内容以引入的方式并入本文中。由于数据中心中老鼠流数量明显多于大象流,在路径轻度拥塞时,中间设备102优先降低大象流的发送速率,保障数量更多的老鼠流的发送速率,有助于降低平均流完成时间(英文:Flow Completion Time,FCT)。
如果第一路径的拥塞程度为重度,则中间设备102查找是否存在拥塞程度不为重度(拥塞程度为不拥塞或轻度拥塞)的从中间设备102到接收设备106的第二路径。如果不存在该第二路径(即没有可切换的路径),则中间设备102降低所有用于对经第一路径发送的数据报文进行确认的确认报文中的窗口值。具体窗口值的计算例如采用DCTCP的拥塞控制算法中对cwnd值的计算。由此,第一路径上所有数据报文(包括大象流和老鼠流)的发送速率都被降低。
如果存在该第二路径,则中间设备102进入换路状态,为大象流选择该第二路径做为新的传输路径,对大象流进行换路处理。中间设备102设置用于对第一数据报文中属于大象流的数据报文进行确认的确认报文中的窗口值为0,并启动定时器。例如第一数据报文中具有第一标识的数据报文属于大象流,设置第一确认报文中具有第二标识的确认报文中的窗口值为0(第一确认报文中具有第二标识的确认报文可以包括多个确认报文,每个确认报文包括一个窗口值,每个确认报文中的窗口值被设置为0)。发送设备101在收到窗口值为0的确认报文后,确定接收设备106能够接收的字节数为0,将会暂时停止发送具有第一标识的数据报文,以便中间设备102进行换路处理。通过以上设置确认报文的窗口值为0,可以使得发送设备暂时停止发送数据报文,创造换路时机。通过以上设置定时器,使得第一路径上正在传输的数据报文有时间先到达接收设备106,随后中间设备102再经第二路径向接收设备106发送另一具有第一标识的数据报文(该另一具有第一标识的数据报文可以包括多个数据报文)。由此,换路导致接收设备106上出现数据报文乱序的可能性被降低。如果在定时器超时前,中间设备102收到接收设备106发送的第二确认报文,且第二确认报文指示所有第一数据报文中具有第一标识的数据报文(假设属于大象流)已被接收设备106接收,则中间设备102直接执行换路处理。这有助于减少等待时间。
以上所述换路处理包括向发送设备101发送具有第二标识的确认报文,该确认报文中的窗口值恢复为进入换路状态之前的窗口值(该具有第二标识的确认报文可以包括多个确认报文,每个确认报文包括一个窗口值,每个确认报文中的窗口值被恢复为进入换路状态之前的窗口值)。发送设备101收到具有第二标识的窗口值恢复的确认报文后,重新开始向中间设备102发送另一具有第一标识的数据报文(该另一具有第一标识的数据报文可以包括多个数据报文),中间设备102经该第二路径向接收设备106发送该另一具有第一标识的数据报文。由此,所述大象流被换到第二路径进行传输。
如果在定时器超时前,中间设备102没有收到指示第一数据报文中所有具有第一标识的数据报文(假设属于大象流)已被接收设备106接收的确认报文,则在定时器超时后,中间设备102执行换路处理,该换路处理与上述定时器超时前执行的换路处理相同。
S206、发送设备101向中间设备102发送第二数据报文。
发送设备101获取中间设备102发送的第一确认报文中的窗口值,根据该窗口值调整发送速率,向中间设备102发送第二数据报文。调整发送速率例如是根据第一确认报文中的窗口值,调整发送窗口值。具体实现可参照IETF RFC793,其中TCP协议中发送设备可以根据接收设备返回的ACK报文中的窗口值动态调整发送窗口大小。另外,当存在换路处理时,换路处理过程中发送设备101的执行操作可参照S205中的描述。在换路处理后,中间设备102将来自于发送设备101的数据报文中具有第一标识的数据报文(假设属于大象流)经第二路径向接收设备106发送。在换路处理后,中间设备102仍可以将来自于发送设备101的数据报文中具有其他标识的数据报文经第一路径向接收设备106发送。
以上方法中发送设备101只需向中间设备102发送数据报文,并根据中间设备102发送的确认报文中的窗口值调整发送速率(即调整发送窗口大小)。以上发送设备101上的处理遵循现有TCP协议,不需要做技术改动,由此本方法具有较好的兼容性和便利性。
图3为本申请实施例提供的一种网络设备300的结构图。网络设备300可以是图1中所示中间设备102,执行图2所示方法中中间设备102执行的操作。网络设备300包括接收单元301、发送单元302和处理单元303。
接收单元301,用于接收发送设备101发送的第一数据报文,还用于接收接收设备106发送的用于对所述第一数据报文进行确认的第一确认报文。
发送单元302,用于经第一路径(例如包括中间设备103、104和105)向接收设备106发送所述第一数据报文。
处理单元303,用于根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,并根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值。所述第一标识包括源IP地址、源端口、目的IP地址和目的端口(例如发送设备101的IP地址、发送设备101上的端口、接收设备106的IP地址和接收设备106上的端口)。
发送单元302,还用于将修改后的第一确认报文发送给发送设备101。所述窗口值用于通知发送设备101接收设备106能够接收的字节数。
处理单元303修改第一确认报文中的窗口值可以包括以下几种方式:
如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则处理单元303降低第一确认报文中具有第二标识的确认报文中的窗口值。第二标识包括源IP地址、源端口、目的IP地址和目的端口。第一标识和第二标识具有前述对应关系。
如果所述第一路径的拥塞程度为重度,且不存在从网络设备300到接收设备106的拥塞程度为轻度或不拥塞的路径,则处理单元303降低所述第一确认报文中所有确 认报文中的窗口值。
如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从网络设备300到接收设备106的拥塞程度为轻度或不拥塞的第二路径,则处理单元303设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器。
在所述定时器超时前,如果接收单元301收到接收设备106发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被接收设备106接收,则发送单元302经所述第二路径向接收设备106发送接收单元301接收自发送设备101的第三数据报文中具有所述第一标识的数据报文。
在所述定时器超时后,发送单元302经所述第二路径向接收设备106发送接收单元301接收自发送设备101的第二数据报文中具有所述第一标识的数据报文。
以上接收单元301和发送单元302可以集成到一个单元,由该单元完成以上接收单元301和发送单元302的功能。
图4为本申请实施例提供的一种网络设备400的结构图。网络设备400可以是图1中所示中间设备102,执行图2所示方法中中间设备102执行的操作。网络设备400包括接收器401、发送器402和处理器403。
接收器401,用于接收发送设备101发送的第一数据报文,还用于接收接收设备106发送的用于对所述第一数据报文进行确认的第一确认报文。
发送器402,用于经第一路径(例如包括中间设备103、104和105)向接收设备106发送所述第一数据报文。
处理器403,用于根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,并根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值。所述第一标识包括源IP地址、源端口、目的IP地址和目的端口(例如发送设备101的IP地址、发送设备101上的端口、接收设备106的IP地址和接收设备106上的端口)。
发送器402,还用于将修改后的第一确认报文发送给发送设备101。所述窗口值用于通知发送设备101接收设备106能够接收的字节数。
处理器403修改第一确认报文中的窗口值可以包括以下几种方式:
如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则处理器403降低第一确认报文中具有第二标识的确认报文中的窗口值。第二标识包括源IP地址、源端口、目的IP地址和目的端口。第一标识和第二标识具有前述对应关系。
如果所述第一路径的拥塞程度为重度,且不存在从网络设备400到接收设备106的拥塞程度为轻度或不拥塞的路径,则处理器403降低所述第一确认报文中所有确认报文中的窗口值。
如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从网络设备400到接收设备106的拥塞程度为轻度或不拥塞的第二路径,则处理器403设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器。
在所述定时器超时前,如果接收器401收到接收设备106发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被接收设备106接收,则发送器402经所述第二路径向接收设备106发送接收器401接收自发送设备101的第三数据报文中具有所述第一标识的数据报文。
在所述定时器超时后,发送器402经所述第二路径向接收设备106发送接收器401接收自发送设备101的第二数据报文中具有所述第一标识的数据报文。
接收器401和发送器402可以和处理器403通过总线通信,也可以直连。接收器401和发送器402可以集成到一个部件上,由该部件完成以上接收器401和发送器402的功能,该部件例如是网络接口。该网络接口例如是以太网接口、异步传输模式(英文:Asynchronous Transfer Mode,ATM)接口或基于SDH/SONET的包封装(英文:Packet over SONET/SDH,POS)接口。
处理器403包括但不限于中央处理器(英文:Central Processing Unit,CPU),网络处理器(英文:Network Processor,NP),专用集成电路(英文:Application-Specific Integrated Circuit,ASIC)或者可编程逻辑器件(英文:Programmable Logic Device,PLD)中的一个或多个。上述PLD可以是复杂可编程逻辑器件(英文:Complex Programmable Logic Device,CPLD),现场可编程逻辑门阵列(英文:Field-Programmable Gate Array,FPGA),通用阵列逻辑(英文:Generic Array Logic,GAL)或其任意组合。
图5为本申请实施例提供的一种网络结构示意图。该网络包括发送设备501、至少一个中间设备(例如图5中的中间设备502、503、505和506)和接收设备504。发送设备501通过所述中间设备向接收设备504发送数据报文。发送设备501包括处理器5011和网络接口控制器NIC5012。网络接口控制器NIC5012也可以称作网络适配器(英文:network adapter),网络接口卡(英文:network interface card),或网卡(network card)等。发送设备501上处理器5011可以执行TCP进程,经过NIC5012与接收设备504建立TCP连接。
处理器5011发送的数据报文均先到达NIC5012,随后经由第一路径到达接收设备504。该第一路径可以包括0个、1个或多个中间设备(例如图5中的中间设备502和503)。接收设备504发送的用于对发送设备501发送的数据报文进行确认的确认报文(例如ACK报文)可以经由该第一路径到达NIC5012,再由NIC5012发送给处理器5011。或者该确认报文可以经由与第一路径不同的第二路径到达NIC5012,再由NIC5012发送给处理器5011。该第二路径可以包括0个、1个或多个中间设备(例如图5中的中间设备506和505)。用于对发送设备501发送的数据报文进行确认的确认报文均经过NIC5012到达处理器5011。以图5网络结构应用于数据中心为例,发送设备501可以是主机或服务器,接收设备504可以是主机或服务器,以上所述中间设备可以为交换机、路由器、虚拟交换机等。中间设备502和中间设备505还可以是柜顶(英文:top of rack,TOR)交换机,中间设备503和中间设备506也可以是TOR交换机。
图6为本申请实施例提供的一种拥塞控制方法的流程图。该方法可应用于图5所示网络结构。该方法包括如下步骤:
S601、网络接口控制器5012接收处理器5011发送的第一数据报文,经第一路径 向接收设备504发送所述第一数据报文。
第一路径可以包括0个、1个或多个中间设备,例如图5所示中间设备502和503。
网络接口控制器NIC5012接收到第一数据报文后,选择第一数据报文的传输路径,并采用源路由(英文:source routing)技术,在发送的数据报文中携带要经过的路径的信息。该路径的信息例如是第一数据报文要经过的第一路径包括的各个中间设备的IP地址。举例来说,NIC5012在第一数据报文的IP报文头部的选项字段中添加要经过的中间设备的IP地址(例如中间设备502和503的IP地址)。由此第一路径上的各中间设备可以根据IP报文头部的选项字段中的IP地址对数据报文进行转发。
第一数据报文中可以包括多个数据报文,该多个数据报文可以具有不同的标识。该多个数据报文可以通过标识进行区分。该标识例如包括源IP地址、源端口、目的IP地址和目的端口。该标识例如还可以包括传输层协议号(TCP的传输层协议号为6),由此该标识构成一个五元组。NIC5012在为第一数据报文选择第一路径后,可以记录第一数据报文中每个数据报文的标识和该第一路径之间的对应关系。例如记录接收自处理器5011的第一数据报文中具有第一标识的数据报文的第一标识(例如包括发送设备501的源IP地址、发送设备501上的源端口、接收设备504的目的IP地址和接收设备504上的目的端口)以及对应的该第一路径的信息(例如包括中间设备502和503的IP地址)。例如还可以记录第一数据报文中具有其他标识的数据报文的该其他标识(例如包括发送设备501的源IP地址、发送设备501上的另一源端口、接收设备504的目的IP地址和接收设备504上的另一目的端口)以及对应的该第一路径的信息(例如包括中间设备502和503的IP地址)。NIC5012可以将所有经过第一路径发送的数据报文的标识与该第一路径的对应关系进行记录。该经过第一路径发送的数据报文还可以包括发送设备501发往接收设备504以外的其他接收设备的数据报文(即接收端不是接收设备504,但数据报文经该第一路径发送,图5未示出)。同样地,NIC5012还可以记录经其他路径向接收设备504发送的数据报文的标识及对应的路径的信息。NIC5012记录了数据报文的标识与路径的对应关系,由此可以根据数据报文的标识确定标识对应的路径(例如根据该第一标识确定该第一路径)。
S602、第一路径上的中间设备在所述第一数据报文中设置拥塞标记。
第一数据报文在经过第一路径上的各中间设备(例如中间设备502和503)时,第一路径上的各中间设备都可以根据当前设备上用于保存该第一数据报文的队列长度确定拥塞程度,并根据拥塞程度选择是否更新拥塞标记。该队列长度例如是各中间设备上发送缓存中缓存的待发送数据报文的长度。确定拥塞程度和更新拥塞标记的方式可参照S202中所述,在此不再赘述。
同样地,其他路径上的中间设备也可以用以上方式对经其他路径进行发送的数据报文设置拥塞标记。此外,NIC5012也可以根据当前NIC5012上用于保存第一数据报文的队列长度确定拥塞程度,并用该拥塞程度对应的拥塞标记来更新第一数据报文中的拥塞标记。
S603、接收设备504向网络接口控制器5012发送第一确认报文。
接收设备504接收到第一数据报文后,获取第一数据报文中的拥塞标记,生成第一确认报文。生成方式及第一确认报文的内容可参照S203中所述,在此不再赘述。
第一确认报文可以通过第一数据报文发送时经过的同一路径(即第一路径)到达NIC5012。第一确认报文也可以通过不同于第一路径的第二路径到达NIC5012。第二路径可以包括0个、1个或多个中间设备,例如图5所示中间设备506和505。
同样地,NIC5012经其他路径向接收设备504发送的数据报文也具有对应的由接收设备504向NIC5012发送的确认报文。该经其他路径发送的数据报文的标识与对应的确认报文的标识也具有以上S203中描述的对应关系。同样地,对于该NIC5012经其他路径发送的数据报文,接收设备504发送的确认报文也可以采用S203在第一确认报文中使用拥塞标记的方式,并经与数据报文发送时经过的同一路径或不同路径返回该确认报文。
S604、NIC5012根据第一确认报文中的拥塞标记确定第一路径的拥塞程度。
NIC5012根据第一确认报文中的第二标识可以确定第一标识,并进一步根据第一标识确定第一路径的信息。进而,NIC5012可以根据第一确认报文中拥塞标记指示各个拥塞程度的确认报文的数量或比例来计算拥塞程度。确定第一路径的拥塞程度的方式可参照S204中所述,在此不再赘述。
同样地,NIC5012接收到接收设备504发送的用于对各个路径发送的数据报文进行确认的确认报文后,可以采用以上过程,根据确认报文中的拥塞标记确定各个路径的拥塞程度并进行记录。由此,NIC5012可以记录数据报文的标识、使用的路径的信息和路径拥塞程度。
S605、NIC5012根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值,将修改后的第一确认报文发送给处理器5011。
NIC5012根据第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改第一确认报文中的窗口值。NIC5012在发送数据报文时,可以对数据报文中具有各个标识的数据报文分别进行字节数统计,以便根据路径的拥塞程度和具有各个标识的数据报文的字节数大小来确定不同的窗口值。举例来说,步骤S601中NIC5012发送第一数据报文时,对第一数据报文中具有各个标识的数据报文进行字节数统计。NIC5012将统计的字节数与阈值f1相比较以区分具有各个标识的数据报文属于大象流还是老鼠流。NIC5012区分数据报文属于大象流还是老鼠流的方式可参照S205中所述,在此不再赘述。
NIC5012可以根据第一路径不同的拥塞程度,对大象流老鼠流的发送速率做不同调整。该调整通过修改用于对该大象流或老鼠流中的数据报文进行确认的确认报文的窗口值来实现。举例来说,可以按照如下方式修改窗口值:
如果第一路径的拥塞程度为轻度,减小第一确认报文中具有第二标识的确认报文中的窗口值。具体可参照S205中所述,在此不再赘述。
如果第一路径的拥塞程度为重度,则NIC5012查找是否存在拥塞程度不为重度(拥塞程度为不拥塞或轻度拥塞)的从NIC5012到接收设备504的第二路径。如果不存在该第二路径(即没有可切换的路径),则NIC5012降低所有用于对经第一路径发送的数据报文进行确认的确认报文中的窗口值。具体窗口值的计算例如采用DCTCP的拥塞控制算法中对cwnd值的计算。
如果存在该第二路径,则NIC5012进入换路状态,为大象流选择该第二路径做为新的传输路径,对大象流进行换路处理。NIC5012设置用于对第一数据报文中的大象流进行确认的确认报文中的窗口值为0,并启动定时器。例如第一数据报文中具有第一标识的数据报文属于大象流,设置第一确认报文中具有第二标识的确认报文中的窗口值为0(具体可参照S205中所述)。处理器5011在收到窗口值为0的确认报文后,确定接收设备504能够接收的字节数为0,将会暂时停止发送具有第一标识的数据报文,以便NIC5012进行换路处理。如果在定时器超时前,NIC5012收到接收设备504发送的第二确认报文,且第二确认报文指示所有第一数据报文中具有第一标识的数据报文(假设属于大象流)已被接收设备504接收,则NIC5012直接执行换路处理。
以上所述换路处理包括NIC5012向处理器5011发送具有第二标识的确认报文,该确认报文中的窗口值恢复为进入换路状态之前的窗口值(具体可参照S205中所述)。处理器5011收到具有第二标识的窗口值恢复的确认报文后,重新开始向NIC5012发送另一具有第一标识的数据报文(该另一具有第一标识的数据报文可以包括多个数据报文),NIC5012经该第二路径向接收设备504发送该具有第一标识的数据报文。由此,所述大象流被换到第二路径进行传输。
如果在定时器超时前,NIC5012没有收到指示第一数据报文中所有具有第一标识的数据报文(假设属于大象流)已被接收设备504接收的确认报文,则在定时器超时后,NIC5012执行换路处理,该换路处理与上述定时器超时前执行的换路处理相同。
S606、处理器5011向网络接口控制器5012发送第二数据报文。
处理器5011获取网络接口控制器NIC5012发送的第一确认报文中的窗口值,根据该窗口值调整发送速率,向NIC5012发送第二数据报文。调整发送速率例如是根据第一确认报文中的窗口值,调整发送窗口值。具体实现可参照IETF RFC793,其中TCP协议中发送设备可以根据接收设备返回的ACK报文中的窗口值动态调整发送窗口大小。另外,当存在换路处理时,换路处理过程中处理器5011的执行操作可参照S605中的描述。在换路处理后,NIC5012将来自于处理器5011的数据报文中具有第一标识的数据报文(假设属于大象流)经第二路径向接收设备504发送。在换路处理后,NIC5012仍可以将来自于处理器5011的数据报文中具有其他标识的数据报文经第一路径向接收设备504发送。
以上方法中处理器5011只需向NIC5012发送数据报文,并根据NIC5012发送的确认报文中的窗口值调整发送速率(即调整发送窗口大小)。以上处理器5011上的处理遵循现有TCP协议,不需要做技术改动,发送设备501上只需添加具有以上功能的网络接口控制器NIC5012即可实现本方法。由此本方法具有较好的兼容性和便利性。
图7为本申请实施例提供的一种网络接口控制器700的结构图。网络接口控制器700(即NIC700)可以是图5中所示网络接口控制器NIC5012,执行图6所示方法中网络接口控制器NIC5012执行的操作。网络接口控制器700包括接收单元701、发送单元702和处理单元703。
接收单元701,用于接收处理器5011发送的第一数据报文,还用于接收接收设备504发送的用于对所述第一数据报文进行确认的第一确认报文。
发送单元702,用于经第一路径(例如包括中间设备502和503)向接收设备504 发送所述第一数据报文。
处理单元703,用于根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,并根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值。所述第一标识包括源IP地址、源端口、目的IP地址和目的端口(例如发送设备501的IP地址、发送设备501上的端口、接收设备504的IP地址和接收设备504上的端口)。
发送单元702,还用于将修改后的第一确认报文发送给处理器5011。所述窗口值用于通知处理器5011接收设备504能够接收的字节数。
处理单元703修改第一确认报文中的窗口值可以包括以下几种方式:
如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则处理单元703降低第一确认报文中具有第二标识的确认报文中的窗口值。第二标识包括源IP地址、源端口、目的IP地址和目的端口。第一标识和第二标识具有前述对应关系。
如果所述第一路径的拥塞程度为重度,且不存在从NIC700到接收设备504的拥塞程度为轻度或不拥塞的路径,则处理单元703降低所述第一确认报文中所有确认报文中的窗口值。
如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从NIC700到接收设备504的拥塞程度为轻度或不拥塞的第二路径,则处理单元703设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器。
在所述定时器超时前,如果接收单元701收到接收设备504发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被接收设备504接收,则发送单元702经所述第二路径向接收设备504发送接收单元701接收自处理器5011的第三数据报文中具有所述第一标识的数据报文。
在所述定时器超时后,发送单元702经所述第二路径向接收设备504发送接收单元701接收自处理器5011的第二数据报文中具有所述第一标识的数据报文。
以上接收单元701和发送单元702可以集成到一个单元,由该单元完成以上接收单元701和发送单元702的功能。
图8为本申请实施例提供的一种网络接口控制器800的结构图。网络接口控制器800(即NIC800)可以是图5中所示网络接口控制器NIC5012,执行图6所示方法中网络接口控制器NIC5012执行的操作。网络接口控制器800包括接收器801、发送器802和处理器803。
接收器801,用于接收处理器5011发送的第一数据报文,还用于接收接收设备504发送的用于对所述第一数据报文进行确认的第一确认报文。
发送器802,用于经第一路径(例如包括中间设备502和503)向接收设备504发送所述第一数据报文。
处理器803,用于根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,并根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值。所述第一标识包括源IP地址、源端 口、目的IP地址和目的端口(例如发送设备501的IP地址、发送设备501上的端口、接收设备504的IP地址和接收设备504上的端口)。
发送器802,还用于将修改后的第一确认报文发送给处理器5011。所述窗口值用于通知处理器5011接收设备504能够接收的字节数。
处理器803修改第一确认报文中的窗口值可以包括以下几种方式:
如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则处理器803降低第一确认报文中具有第二标识的确认报文中的窗口值。第二标识包括源IP地址、源端口、目的IP地址和目的端口。第一标识和第二标识具有前述对应关系。
如果所述第一路径的拥塞程度为重度,且不存在从NIC800到接收设备504的拥塞程度为轻度或不拥塞的路径,则处理器803降低所述第一确认报文中所有确认报文中的窗口值。
如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从NIC800到接收设备504的拥塞程度为轻度或不拥塞的第二路径,则处理器803设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器。
在所述定时器超时前,如果接收器801收到接收设备504发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被接收设备504接收,则发送器802经所述第二路径向接收设备504发送接收器801接收自处理器5011的第三数据报文中具有所述第一标识的数据报文。
在所述定时器超时后,发送器802经所述第二路径向接收设备504发送接收器801接收自处理器5011的第二数据报文中具有所述第一标识的数据报文。
接收器801和发送器802可以和处理器803通过总线通信,也可以直连。该总线例如是高速外设组件互联(英文:Peripheral Component Interconnect Express,PCI-E)总线。接收器801和发送器802可以集成到一个部件上,由该部件完成以上接收器801和发送器802的功能,该部件例如是网络接口。该网络接口例如是以太网接口、异步传输模式(英文:Asynchronous Transfer Mode,ATM)接口或基于SDH/SONET的包封装(英文:Packet over SONET/SDH,POS)接口。
处理器803包括但不限于中央处理器(英文:Central Processing Unit,CPU),网络处理器(英文:Network Processor,NP),专用集成电路(英文:Application-Specific Integrated Circuit,ASIC)或者可编程逻辑器件(英文:Programmable Logic Device,PLD)中的一个或多个。上述PLD可以是复杂可编程逻辑器件(英文:Complex Programmable Logic Device,CPLD),现场可编程逻辑门阵列(英文:Field-Programmable Gate Array,FPGA),通用阵列逻辑(英文:Generic Array Logic,GAL)或其任意组合。
本领域技术人员应明白,本申请的实施例可提供为方法、装置(设备)、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用 程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。计算机程序存储/分布在合适的介质中,与其它硬件一起提供或作为硬件的一部分,也可以采用其他分布形式,如通过Internet或其它有线或无线电信***。
本申请是参照本申请实施例的方法、装置(设备)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
本说明书的各个部分均采用递进的方式进行描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点介绍的都是与其他实施例不同之处。尤其,对于装置和***实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例部分的说明即可。
应理解,在本申请的各种实施例中,上述各方法的序号的大小并不意味着执行顺序的先后,各方法的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。

Claims (29)

  1. 一种拥塞控制方法,应用于数据传输网络中,所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备,所述发送设备通过所述中间设备向所述接收设备发送数据报文,其特征在于,所述方法包括:
    所述至少一个中间设备中的第一中间设备接收所述发送设备发送的第一数据报文,经第一路径向所述接收设备发送第一数据报文;
    所述第一中间设备接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文,根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度;
    所述第一中间设备根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值,将修改后的第一确认报文发送给所述发送设备,所述窗口值用于通知所述发送设备所述接收设备能够接收的字节数,所述第一标识包括源IP地址、源端口、目的IP地址和目的端口。
  2. 如权利要求1所述的方法,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述第一中间设备降低所述第一确认报文中具有第二标识的确认报文中的窗口值,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。
  3. 如权利要求1或2所述的方法,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为重度,且不存在从所述第一中间设备到所述接收设备的拥塞程度为轻度或不拥塞的路径,则所述第一中间设备降低所述第一确认报文中所有确认报文中的窗口值。
  4. 如权利要求1至3任一所述的方法,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述第一中间设备到所述接收设备的拥塞程度为轻度或不拥塞的第二路径,则所述第一中间设备设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同;
    所述方法还包括:在所述定时器超时后,所述第一中间设备经所述第二路径向所述接收设备发送接收自所述发送设备的第二数据报文中具有所述第一标识的数据报文。
  5. 如权利要求4所述的方法,其特征在于,所述方法还包括:
    在所述定时器超时前,如果所述第一中间设备收到所述接收设备发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被所述接收设备接收,则所述第一中间设备经所述第二路径向所述接收设备发送接收自所述发送设备的第三数据报文中具有所述第一标识的数据报文。
  6. 一种拥塞控制方法,应用于数据传输网络中,所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备,所述发送设备包括处理器和网络接口控制器NIC,所述发送设备通过所述中间设备向所述接收设备发送数据报文,其特征在于,所述方法包括:
    所述网络接口控制器NIC接收所述处理器发送的第一数据报文,经第一路径向所述接收设备发送第一数据报文;
    所述NIC接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文,根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度;
    所述NIC根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值,将修改后的第一确认报文发送给所述处理器,所述窗口值用于通知所述处理器所述接收设备能够接收的字节数,所述第一标识包括源IP地址、源端口、目的IP地址和目的端口。
  7. 如权利要求6所述的方法,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述NIC降低所述第一确认报文中具有第二标识的确认报文中的窗口值,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址、目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址、源端口相同。
  8. 如权利要求6或7所述的方法,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为重度,且不存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的路径,则所述NIC降低所述第一确认报文中所有确认报文中的窗口值。
  9. 如权利要求6至8任一所述的方法,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的第二路径,则所述NIC设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同;
    所述方法还包括:在所述定时器超时后,所述NIC经所述第二路径向所述接收设备发送接收自所述处理器的第二数据报文中具有所述第一标识的数据报文。
  10. 如权利要求9所述的方法,其特征在于,所述方法还包括:
    在所述定时器超时前,如果所述NIC收到所述接收设备发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被所述接收设备接收,则所述NIC经所述第二路径向所述接收设备发送接收自所述处理器的第三数据报文中具有所述第一标识的数据报文。
  11. 一种网络设备,应用于数据传输网络中,所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备,所述网络设备为所述至少一个中间设备中的第一中间设备,所述发送设备通过所述中间设备向所述接收设备发送数据报文,其特征在于,所述网络设备包括:
    接收单元,用于接收所述发送设备发送的第一数据报文,还用于接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文;
    发送单元,用于经第一路径向所述接收设备发送所述第一数据报文;
    处理单元,用于根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,并根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值,所述第一标识包括源IP地址、源端口、目的IP地址和目的端口;
    所述发送单元,还用于将修改后的第一确认报文发送给所述发送设备,所述窗口值用于通知所述发送设备所述接收设备能够接收的字节数。
  12. 如权利要求11所述的网络设备,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述处理单元降低所述第一确认报文中具有第二标识的确认报文中的窗口值,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。
  13. 如权利要求11或12所述的网络设备,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为重度,且不存在从所述网络设备到所述接收设备的拥塞程度为轻度或不拥塞的路径,则所述处理单元降低所述第一确认报文中所有确认报文中的窗口值。
  14. 如权利要求11至13任一所述的网络设备,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述网络设备到所述接收设备的拥塞程度 为轻度或不拥塞的第二路径,则所述处理单元设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同;
    所述发送单元,还用于在所述定时器超时后,经所述第二路径向所述接收设备发送所述接收单元接收自所述发送设备的第二数据报文中具有所述第一标识的数据报文。
  15. 如权利要求14所述的网络设备,其特征在于,
    所述发送单元,还用于在所述定时器超时前,如果所述接收单元收到所述接收设备发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被所述接收设备接收,则经所述第二路径向所述接收设备发送所述接收单元接收自所述发送设备的第三数据报文中具有所述第一标识的数据报文。
  16. 一种网络接口控制器NIC,应用于数据传输网络中,所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备,所述发送设备包括处理器和所述网络接口控制器NIC,所述发送设备通过所述中间设备向所述接收设备发送数据报文,其特征在于,所述NIC包括:
    接收单元,用于接收所述处理器发送的第一数据报文,还用于接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文;
    发送单元,用于经第一路径向所述接收设备发送所述第一数据报文;
    处理单元,用于根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,并根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值,所述第一标识包括源IP地址、源端口、目的IP地址和目的端口;
    所述发送单元,还用于将修改后的第一确认报文发送给所述处理器,所述窗口值用于通知所述处理器所述接收设备能够接收的字节数。
  17. 如权利要求16所述的NIC,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述处理单元降低所述第一确认报文中具有第二标识的确认报文中的窗口值,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。
  18. 如权利要求16或17所述的NIC,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为重度,且不存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的路径,则所述处理单元降低所述第一确认报文中所有确认报文中的窗口值。
  19. 如权利要求16至18任一所述的NIC,其特征在于,所述根据所述第一路径 的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的第二路径,则所述处理单元设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同;
    所述发送单元,还用于在所述定时器超时后,经所述第二路径向所述接收设备发送所述接收单元接收自所述处理器的第二数据报文中具有所述第一标识的数据报文。
  20. 如权利要求19所述的NIC,其特征在于,
    所述发送单元,还用于在所述定时器超时前,如果所述接收单元收到所述接收设备发送的第二确认报文,且所述第二确认报文指示所述第一数据报文中所有具有所述第一标识的数据报文已被所述接收设备接收,则经所述第二路径向所述接收设备发送所述接收单元接收自所述处理器的第三数据报文中具有所述第一标识的数据报文。
  21. 一种网络设备,应用于数据传输网络中,所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备,所述网络设备为所述至少一个中间设备中的第一中间设备,所述发送设备通过所述中间设备向所述接收设备发送数据报文,其特征在于,所述网络设备包括:
    接收器,用于接收所述发送设备发送的第一数据报文,还用于接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文;
    发送器,经第一路径向所述接收设备发送所述第一数据报文,
    处理器,用于根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,并根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值,所述第一标识包括源IP地址、源端口、目的IP地址和目的端口;
    所述发送器,还用于将修改后的第一确认报文发送给所述发送设备,所述窗口值用于通知所述发送设备所述接收设备能够接收的字节数。
  22. 如权利要求21所述的网络设备,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述处理器降低所述第一确认报文中具有第二标识的确认报文中的窗口值,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。
  23. 如权利要求21或22所述的网络设备,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述第一中间设备到所述接收设备的拥塞程度为轻度或不拥塞的第二路径,则所述处理器设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同;
    所述发送器,还用于在所述定时器超时后,经所述第二路径向所述接收设备发送所述接收器接收自所述发送设备的第二数据报文中具有所述第一标识的数据报文。
  24. 一种网络接口控制器NIC,应用于数据传输网络中,所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备,所述发送设备包括第一处理器和所述网络接口控制器NIC,所述发送设备通过所述中间设备向所述接收设备发送数据报文,其特征在于,所述NIC包括:
    接收器,用于接收所述第一处理器发送的第一数据报文,还用于接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文;
    发送器,用于经第一路径向所述接收设备发送所述第一数据报文;
    第二处理器,用于根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,并根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值,所述第一标识包括源IP地址、源端口、目的IP地址和目的端口;
    所述发送器,还用于将修改后的第一确认报文发送给所述第一处理器,所述窗口值用于通知所述第一处理器所述接收设备能够接收的字节数。
  25. 如权利要求24所述的NIC,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述第二处理器降低所述第一确认报文中具有第二标识的确认报文中的窗口值,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。
  26. 如权利要求24或25所述的NIC,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的第二路径,则所述第二处理器设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同;
    所述发送器,还用于在所述定时器超时后,经所述第二路径向所述接收设备发送 所述接收器接收自所述第一处理器的第二数据报文中具有所述第一标识的数据报文。
  27. 一种网络设备,应用于数据传输网络中,所述数据传输网络包括发送设备,至少一个中间设备,以及接收设备,所述网络设备为所述发送设备,所述发送设备通过所述中间设备向所述接收设备发送数据报文,其特征在于,所述发送设备包括:处理器和网络接口控制器NIC;
    所述处理器,用于向所述NIC发送第一数据报文;
    所述NIC,用于接收所述处理器发送的第一数据报文,经第一路径向所述接收设备发送所述第一数据报文,还用于接收所述接收设备发送的用于对所述第一数据报文进行确认的第一确认报文,根据所述第一确认报文中的拥塞标记确定所述第一路径的拥塞程度,根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数修改所述第一确认报文中的窗口值,并将修改后的第一确认报文发送给所述处理器,所述窗口值用于通知所述处理器所述接收设备能够接收的字节数,所述第一标识包括源IP地址、源端口、目的IP地址和目的端口;
    所述处理器进一步用于接收所述NIC发送的所述修改后的第一确认报文,根据所述修改后的第一确认报文通过所述NIC向所述接收设备发送第二数据报文。
  28. 如权利要求27所述的网络设备,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为轻度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,则所述NIC降低所述第一确认报文中具有第二标识的确认报文中的窗口值,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同。
  29. 如权利要求27或28所述的网络设备,其特征在于,所述根据所述第一路径的拥塞程度和所述第一数据报文中具有第一标识的数据报文的字节数,修改所述第一确认报文中的窗口值包括:
    如果所述第一路径的拥塞程度为重度,且第一数据报文中具有所述第一标识的数据报文的字节数大于设定的阈值,且存在从所述NIC到所述接收设备的拥塞程度为轻度或不拥塞的第二路径,则所述NIC设置所述第一确认报文中具有第二标识的确认报文的窗口值为0,并启动定时器,所述第二标识包括源IP地址、源端口、目的IP地址和目的端口,所述第一标识中的源IP地址、源端口、目的IP地址和目的端口分别与所述第二标识中的目的IP地址、目的端口、源IP地址和源端口相同;
    所述NIC,还用于在所述定时器超时后,经所述第二路径向所述接收设备发送所述NIC接收自所述处理器的第二数据报文中具有所述第一标识的数据报文。
PCT/CN2018/084819 2017-05-15 2018-04-27 一种拥塞控制方法、网络设备及其网络接口控制器 WO2018210117A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18803065.4A EP3618372B1 (en) 2017-05-15 2018-04-27 Congestion control method and network device
US16/683,730 US11228534B2 (en) 2017-05-15 2019-11-14 Congestion control method, network device, and network interface controller

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710340116.3 2017-05-15
CN201710340116.3A CN108881056B (zh) 2017-05-15 2017-05-15 一种拥塞控制方法、网络设备及其网络接口控制器

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/683,730 Continuation US11228534B2 (en) 2017-05-15 2019-11-14 Congestion control method, network device, and network interface controller

Publications (1)

Publication Number Publication Date
WO2018210117A1 true WO2018210117A1 (zh) 2018-11-22

Family

ID=64273383

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/084819 WO2018210117A1 (zh) 2017-05-15 2018-04-27 一种拥塞控制方法、网络设备及其网络接口控制器

Country Status (4)

Country Link
US (1) US11228534B2 (zh)
EP (1) EP3618372B1 (zh)
CN (1) CN108881056B (zh)
WO (1) WO2018210117A1 (zh)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210297351A1 (en) * 2017-09-29 2021-09-23 Fungible, Inc. Fabric control protocol with congestion control for data center networks
US20210297350A1 (en) * 2017-09-29 2021-09-23 Fungible, Inc. Reliable fabric control protocol extensions for data center networks with unsolicited packet spraying over multiple alternate data paths
US10785161B2 (en) * 2018-07-10 2020-09-22 Cisco Technology, Inc. Automatic rate limiting based on explicit network congestion notification in smart network interface card
EP3844924A1 (en) * 2018-08-31 2021-07-07 Telefonaktiebolaget LM Ericsson (publ) Management of acknowledgement signalling in a multi-point communication system
CN112910789A (zh) * 2019-12-03 2021-06-04 华为技术有限公司 拥塞控制方法以及相关设备
CN111343669A (zh) * 2020-03-10 2020-06-26 中国联合网络通信集团有限公司 一种报文调度方法及装置
CN111328106B (zh) * 2020-03-10 2023-04-18 中国联合网络通信集团有限公司 拥塞控制方法及装置
CN111372283B (zh) * 2020-03-10 2023-04-14 中国联合网络通信集团有限公司 拥塞控制方法及基站、用户面功能实体
CN112073328B (zh) * 2020-08-24 2023-03-24 浙江鸿城科技有限责任公司 一种减低查询次数的方法
CN112104564A (zh) * 2020-08-31 2020-12-18 新华三技术有限公司 一种负载分担方法及设备
CN114726790B (zh) * 2021-01-06 2024-04-16 北京中科海网科技有限公司 一种基于传输内容大小的拥塞控制选择方法及***
US11838209B2 (en) * 2021-06-01 2023-12-05 Mellanox Technologies, Ltd. Cardinality-based traffic control
WO2023272532A1 (zh) * 2021-06-29 2023-01-05 新华三技术有限公司 一种拥塞处理方法、装置、网络设备以及存储介质
WO2023048925A1 (en) * 2021-09-23 2023-03-30 Intel Corporation Network resource monitoring
US11848868B2 (en) 2021-09-29 2023-12-19 Huawei Technologies Co., Ltd. Methods, systems and devices for network management using control packets
CN113949651B (zh) * 2021-11-01 2023-04-07 北京百度网讯科技有限公司 网络传输方法、装置、设备、存储介质
CN114760252B (zh) * 2022-03-24 2024-06-07 北京邮电大学 数据中心网络拥塞控制方法及***
US11863451B2 (en) * 2022-05-16 2024-01-02 Huawei Technologies Co., Ltd. Hardware accelerated temporal congestion signals
CN114679408B (zh) * 2022-05-27 2022-08-26 湖南工商大学 路径切换感知的数据中心拥塞控制方法和***
CN115484210B (zh) * 2022-08-16 2023-07-25 北京百度网讯科技有限公司 拥塞窗口的确定方法、装置与***

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105897606A (zh) * 2016-03-28 2016-08-24 深圳市双赢伟业科技股份有限公司 基于路由器的tcp优化方法及路由器
CN106027412A (zh) * 2016-05-30 2016-10-12 南京理工大学 一种基于拥塞队列长度的tcp拥塞控制方法
US20160323194A1 (en) * 2015-05-01 2016-11-03 Fujitsu Limited System, method, and receiving device
CN106533970A (zh) * 2016-11-02 2017-03-22 重庆大学 面向云计算数据中心网络的差分流传输控制方法及装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000076139A1 (en) * 1999-06-04 2000-12-14 Nokia Corporation Packet data transmission control
US6741555B1 (en) * 2000-06-14 2004-05-25 Nokia Internet Communictions Inc. Enhancement of explicit congestion notification (ECN) for wireless network applications
US7047312B1 (en) * 2000-07-26 2006-05-16 Nortel Networks Limited TCP rate control with adaptive thresholds
US7352700B2 (en) * 2003-09-09 2008-04-01 Lucent Technologies Inc. Methods and devices for maximizing the throughput of TCP/IP data along wireless links
EP1844583B1 (en) * 2005-01-31 2009-12-16 BRITISH TELECOMMUNICATIONS public limited company Control of data flow in a network
KR100664947B1 (ko) * 2005-09-23 2007-01-04 삼성전자주식회사 전송률 제어 방법 및 이를 이용한 통신 장치
CN101631065B (zh) * 2008-07-16 2012-04-18 华为技术有限公司 一种无线多跳网络拥塞的控制方法和装置
US8948009B1 (en) * 2012-05-15 2015-02-03 Google Inc. Deadline aware network protocol
US20170310601A1 (en) * 2016-04-21 2017-10-26 Qualcomm Incorporated Radio-aware transmission control protocol rate control

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160323194A1 (en) * 2015-05-01 2016-11-03 Fujitsu Limited System, method, and receiving device
CN105897606A (zh) * 2016-03-28 2016-08-24 深圳市双赢伟业科技股份有限公司 基于路由器的tcp优化方法及路由器
CN106027412A (zh) * 2016-05-30 2016-10-12 南京理工大学 一种基于拥塞队列长度的tcp拥塞控制方法
CN106533970A (zh) * 2016-11-02 2017-03-22 重庆大学 面向云计算数据中心网络的差分流传输控制方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3618372A4

Also Published As

Publication number Publication date
EP3618372B1 (en) 2024-04-24
US20200084155A1 (en) 2020-03-12
US11228534B2 (en) 2022-01-18
EP3618372A1 (en) 2020-03-04
EP3618372A4 (en) 2020-04-15
CN108881056B (zh) 2022-02-25
CN108881056A (zh) 2018-11-23

Similar Documents

Publication Publication Date Title
WO2018210117A1 (zh) 一种拥塞控制方法、网络设备及其网络接口控制器
CN110661723B (zh) 一种数据传输方法、计算设备、网络设备及数据传输***
CN107204931B (zh) 通信装置和用于通信的方法
US8605590B2 (en) Systems and methods of improving performance of transport protocols
JP5159889B2 (ja) データ・センタ・イーサネット・アーキテクチャの仮想レーン上での適応輻輳制御のための方法、システムおよびコンピュータ・プログラム製品
US20060203730A1 (en) Method and system for reducing end station latency in response to network congestion
US20210297350A1 (en) Reliable fabric control protocol extensions for data center networks with unsolicited packet spraying over multiple alternate data paths
US11005770B2 (en) Listing congestion notification packet generation by switch
US9025451B2 (en) Positive feedback ethernet link flow control for promoting lossless ethernet
US20210297351A1 (en) Fabric control protocol with congestion control for data center networks
US20080159150A1 (en) Method and Apparatus for Preventing IP Datagram Fragmentation and Reassembly
US9729459B2 (en) System and method for credit-based link level flow control
JP2021516012A (ja) ネットワークにおけるフロー管理
US9692560B1 (en) Methods and systems for reliable network communication
WO2015101850A1 (en) Quantized congestion notification (qcn) extension to explicit congestion notification (ecn) for transport-based end-to-end congestion notification
WO2018121535A1 (zh) 一种负载均衡处理方法及装置
EP2868054B1 (en) Resilient video encoding control via explicit network indication
EP3122012B1 (en) Data processing method and apparatus for openflow network
WO2017097201A1 (zh) 一种数据传输方法、发送装置及接收装置
US20070291782A1 (en) Acknowledgement filtering
WO2019001484A1 (zh) 一种实现发送端调速的方法、装置和***
CN111224888A (zh) 发送报文的方法及报文转发设备
EP4385191A1 (en) System and method for congestion control using a flow level transmit mechanism
WO2018133784A1 (zh) 报文处理方法、设备及***
US11805071B2 (en) Congestion control processing method, packet forwarding apparatus, and packet receiving apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18803065

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018803065

Country of ref document: EP

Effective date: 20191126