CN108965121B - Method, host and switch for transmitting data - Google Patents

Method, host and switch for transmitting data Download PDF

Info

Publication number
CN108965121B
CN108965121B CN201710359609.1A CN201710359609A CN108965121B CN 108965121 B CN108965121 B CN 108965121B CN 201710359609 A CN201710359609 A CN 201710359609A CN 108965121 B CN108965121 B CN 108965121B
Authority
CN
China
Prior art keywords
path
slice
host
switch
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710359609.1A
Other languages
Chinese (zh)
Other versions
CN108965121A (en
Inventor
袁峰
李兆耕
毕军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Huawei Technologies Co Ltd
Original Assignee
Tsinghua University
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Huawei Technologies Co Ltd filed Critical Tsinghua University
Priority to CN201710359609.1A priority Critical patent/CN108965121B/en
Publication of CN108965121A publication Critical patent/CN108965121A/en
Application granted granted Critical
Publication of CN108965121B publication Critical patent/CN108965121B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • H04L45/123Evaluation of link metrics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/19Flow control; Congestion control at layers above the network layer
    • H04L47/193Flow control; Congestion control at layers above the network layer at the transport layer, e.g. TCP related
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2466Traffic characterised by specific attributes, e.g. priority or QoS using signalling traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/27Evaluation or update of window size, e.g. using information derived from acknowledged [ACK] packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/28Flow control; Congestion control in relation to timing considerations
    • H04L47/283Flow control; Congestion control in relation to timing considerations in response to processing delays, e.g. caused by jitter or round trip time [RTT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/33Flow control; Congestion control using forward notification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/25Routing or path finding in a switch fabric

Abstract

The embodiment of the invention provides a method for transmitting data, which comprises the following steps: the method comprises the steps that a first host adds a first index in each message of a first slice of a first data stream, and the first index is used for determining a first path corresponding to the first index in a plurality of available paths from the first host to a second host by a switch; and sending a first slice of the first data flow to the switch, wherein a last message of the first slice is determined by the first host according to the path condition information of a first path for forwarding the first slice, so that the switch forwards the first slice to the second host through the first path. In the embodiment of the present invention, the first host adds an index to each slice of the first data stream, so that the path status information of the transmission path of each slice can be known, and further, the problem that the actual transmission path of the first data stream is inconsistent with the path indicated by the host-side congestion window can be avoided.

Description

Method, host and switch for transmitting data
Technical Field
The embodiments of the present invention relate to the field of communications, and in particular, to a method, a host, and a switch for transmitting data.
Background
Currently, internet data is growing in an explosive manner. For example, the number of registered users in China for the Sing microblog is broken by 3 hundred million, the number of active users in the instant messenger for Tencent (Tencent) reaches 7.1 hundred million, and the number of users around the Facebook (Facebook) is approaching 10 hundred million. According to a report issued by international data corporation (Digital universal Study 2011), the total amount of global information doubles every two years. The emergence of big data is forcing enterprises to continuously increase their own data processing capacity with data centers as platforms.
Research and practice for many years find that a data center network based on a Claus (Clos) architecture has better expansibility and more equivalent path number advantages than a traditional tree-shaped architecture topology, and under the Clos architecture, equipment with the same specification can be adopted to construct the data center network without expensive convergence equipment, so that the Clos architecture is more and more widely deployed in the industry.
In the prior art, a Clos architecture-based data center network may use a conventional equal-cost multi-path (ECMP) hash (hash) mechanism for data transmission. The hashing is a hashing mechanism, and specifically refers to a method for hashing M input data into N corresponding results after computation. However, the conventional ECMP hash mechanism is based on stream hash. That is, one flow may correspond exactly to one path. Thus, for each flow, multiple forwarding paths exist in the network, so that different flows are sent to the same forwarding path under the condition that multiple flows are forwarded at the same time, and the forwarding path is congested.
For example, as shown in fig. 1, four flows (flows) A, B, C, D are forwarded from different sources to different destinations, respectively, where flow a and flow B have local collision (local collision) at the first middle tier device on the left, and flow C and flow D may have downstream collision (downlink collision) at the second highest tier device on the left.
In order to solve the above problems, cisco proposes a Distributed Congestion-aware Load Balancing scheme (CONGA) for data centers implemented based on flow slice (flow), instead of the traditional flow-based hash Load Balancing method, which is more elaborate than flow. According to the flow mechanism of the CONGA, each flow can be dispersed to at least two forwarding paths, so that a better load balancing effect is achieved.
However, since the CONGA mechanism is implemented on the switch, only flow is used for load balancing routing decision, and the sender (host) still performs congestion control based on flow, which may cause the problem that the congestion state of the actual transmission path is inconsistent with the congestion window of the sender, thereby reducing the transmission efficiency of data.
For example, as shown in fig. 2, flow a may be transmitted from source switch L0 to destination switch L1 via two paths, L0-S0-L1 and L0-S1-L1, where the link bandwidth corresponding to L0-S0-L1 is 10Gbps, and the link bandwidth corresponding to L0-S1-L1 is 1 Gbps; flow A from the sender can be split into flow A1 and flow A2; when the sender sends the flow A, the sending rate of the flow A is controlled through a rate control algorithm, and the rate is adjusted according to the detected path packet loss condition; the sender firstly sends flow A1 of flow A at the rate of 1Gbps, and L0 selects a path L0-S1-L1 through a CONGA algorithm to forward the flow A1; the sender finds that the flow A traffic transmitted this time is not congested, and then increases the flow rate, for example, sends flow A2 at the rate of 2 Gbps; l0 continues to run the CONGA, and the congestion degree of the path L0-S0-L1 is lower than that of the path L0-S1-L1, so according to the CONGA algorithm, L0 selects to forward flowet A2 through the path L0-S0-L1; however, since the available bandwidth of path L0-S0-L1 is 1Gbps, and the 2Gbps rate of flow 2 exceeds the bandwidth of L0-S0-L1, L0-S0-L1 will be congested, and part of the packets in flow A2 will be discarded.
From the above analysis, it can be seen that since the CONGA mechanism is implemented on the switch and flow is only used in the load balancing routing decision, the host side performs congestion control based on flow, i.e. the host side does not know the actual transmission path of flow a. In other words, the host-side congestion window indicates the congestion status of L0-S1-L1 of flow a1, and the actual transmission path of flow a2 is L0-S0-L1, so that the problem that the actual transmission path of flow a is inconsistent with the transmission path of flow a indicated by the host-side congestion window occurs, and further, the congestion window of the host side cannot accurately indicate the congestion state of the actual transmission path of flow a, thereby affecting the transmission efficiency of data.
Disclosure of Invention
The application provides a data transmission method, a host and a switch, which can effectively improve the data transmission efficiency under the scene of equal cost and multiple paths.
In a first aspect, a method for transmitting data is provided, the method comprising:
the method comprises the steps that a first host adds a first index in each message of a first slice of a first data stream, wherein the first index is used for determining a first path corresponding to the first index in a plurality of available paths from the first host to a second host by a switch;
the first host sends a first slice of the first data stream to the switch, a last message of the first slice is determined by the first host according to path condition information of a first path for forwarding the first slice, and the first slice comprises the first index, so that the switch forwards the first slice to the second host through the first path.
In the embodiment of the present invention, when a first host needs to send a first data stream to a second host, the first host adds an index to each slice of the first data stream, so as to know path status information of a transmission path of each slice, and determine whether to slice the first data stream again according to the path status information of the transmission path of the current slice, thereby avoiding a problem that an actual transmission path of the first data stream is inconsistent with a path indicated by a host-side congestion window, and further effectively ensuring data transmission efficiency.
In some possible implementations, the method further includes:
the first host finishes sending the first slice when the path condition information of the first path shows that the first path is in a congestion state, and adds a second index in each message of a second slice of the first data stream after the first slice, wherein the second index is used for the switch to determine a second path corresponding to the second index in the multiple available paths, and the second path is different from the first path;
and the first host sends the second slice to the switch, so that the switch forwards the second slice to the second host through the second path.
In the data transmission method of the embodiment of the present invention, sending the first slice is finished only when the path status information of the first path indicates that the first path is in a congestion state, so that the first host can reduce the number of slices of the first data stream to the maximum extent.
In some possible implementations, the method further includes:
and after the first host sends the second slice with a preset length, finishing sending the second slice or continuing to send the second slice according to the path condition information of the first path and the path condition information of the second path.
In some possible implementations, after sending the second slice with a predetermined length, the first host ends sending the second slice or continues sending the second slice according to the path condition information of the first path and the path condition information of the second path, including:
the first host finishes sending the second slice when the path condition information of the first path shows that the first path is not in a congestion state;
the method further comprises the following steps:
the first host adding the first index in each message of a third slice of the first data stream after the second slice;
the first host sends a third slice to the switch, the third slice including the first index, so that the switch forwards the third slice to the second host through the first path.
In some possible implementations, after sending the second slice with a predetermined length, the first host ends sending the second slice or continues sending the second slice according to the path condition information of the first path and the path condition information of the second path, including:
and the first host continues to send the second slice when the path condition information of the first path shows that the first path is in a congestion state and the path condition information of the second path shows that the second path is not in the congestion state.
In some possible implementations, after sending the second slice with a predetermined length, the first host ends sending the second slice or continues sending the second slice according to the path condition information of the first path and the path condition information of the second path, including:
the first host finishes sending the second slice when the path condition information of the first path shows that the first path is in a congestion state and the path condition information of the second path shows that the second path is in a congestion state;
the method further comprises the following steps:
the first host adds a third index to each message of a third slice of the first data stream after the second slice, where the second index is used by the switch to determine a third path corresponding to the third index among the multiple available paths, and the first path, the second path, and the third path are different from each other;
the first host sends a third slice to the switch, the third slice including the third index, such that the switch forwards the third slice to the second host on the third path.
In some possible implementations, before the first host adds the first index to each packet of the first slice of the first data stream, the method further includes:
the first host obtains the total number of the plurality of available paths, so that the first host determines the number of indexes which can be added in the first data stream according to the total number of the plurality of available paths.
In some possible implementations, the obtaining, by the first host, the total number of the plurality of available paths includes:
the first host sends a first Transmission Control Protocol (TCP) link request message to the switch, wherein the TCP link request message comprises the first request message, and the first request message is used for requesting to acquire the total number of the available paths, so that the switch adds the total number of the available paths in the first TCP link request message, and generates a second TCP link request message which needs to be sent to the second host;
and the first host receives a response message of the second TCP link request message forwarded by the switch, wherein the response message comprises the total number of the available paths.
In some possible implementations, before the first host adds the second index to each packet of the second slice of the first data stream, the method further includes:
and the first host receives the path condition information of the first path sent by the server corresponding to the second host.
In some possible implementations, the path condition information includes a round trip delay RTT and/or an explicit congestion notification ECN.
In a second aspect, a method of transmitting data is provided, the method comprising:
the method comprises the steps that a switch receives a first slice of a first data stream sent by a first host, the last message of the first slice is determined by the first host according to path condition information of a first path for forwarding the first slice, the first slice comprises a first index, and the first index is used for determining a first path corresponding to the first index in a plurality of available paths from the first host to a second host;
the switch forwards the first slice to a second host through the first path.
In some possible implementations, before the switch forwards the first slice to a second host through the first path, the method further includes:
the switch establishes a corresponding relation table, and the corresponding relation table is used for recording indexes corresponding to each available path in the multiple available paths;
and the switch determines the first path according to the corresponding relation table and the first index.
In some possible implementations, the method further includes:
the switch finishes receiving the first slice, and after the first slice, receives a second slice of the first data stream sent by the first host, wherein the second slice comprises a second index, the second index is used for determining a second path corresponding to the second index in the plurality of available paths by the switch, and the second path is different from the first path;
and the switch forwards the second slice to a second host through the second path.
In some possible implementations, the method further includes:
and after receiving the second slice with the preset length, the switch finishes receiving the second slice or continues receiving the second slice.
In some possible implementations, before the switch receives the first slice of the first data flow sent by the first host, the method further includes:
the switch receives a first Transmission Control Protocol (TCP) link request message sent by the first host, wherein the TCP link request message comprises a first request message, and the first request message is used for requesting to acquire the total number of a plurality of available paths from the first host to the second host, so that the first host determines the number of indexes which can be added in the first data stream according to the total number of the available paths;
the switch adds the total number of the available paths in the first TCP link request message to form a second TCP link request message;
the switch sends the second TCP link request message to the second host;
the switch receives a response message of the second TCP link request message sent by the second host, wherein the response message comprises the total number of the available paths;
and the switch sends the response message to the first host.
In some possible implementations, the switch may re-determine the transmission path by polling the scheduling RR and forward the first slice to the second host on the re-determined transmission path.
In a third aspect, a host is provided, which includes:
the processing unit is used for adding a first index into each message of a first slice of a first data stream, wherein the first index is used for determining a first path corresponding to the first index in a plurality of available paths from a first host to a second host by a switch;
a transceiving unit, configured to send a first slice of the first data stream to the switch, where a last packet of the first slice is determined by the first host according to path status information of a first path for forwarding the first slice, and the first slice includes the first index, so that the switch forwards the first slice to the second host through the first path.
The host of the third aspect is capable of implementing the method of the first aspect and implementations thereof.
In a fourth aspect, a host is provided, the host comprising:
the processor is used for adding a first index into each message of a first slice of a first data stream, and the first index is used for determining a first path corresponding to the first index in a plurality of available paths from the first host to a second host by the switch;
a port, configured to send a first slice of the first data flow to the switch under control of the processor, where a last packet of the first slice is determined by the first host according to path status information of a first path for forwarding the first slice, and the first slice includes the first index, so that the switch forwards the first slice to the second host through the first path.
The switch of the fourth aspect is capable of implementing the method of the first aspect and its various implementations.
In a fifth aspect, a switch is provided, the switch comprising:
a first transceiving unit, configured to receive a first slice of a first data stream sent by a first host, where a last packet of the first slice is determined by the first host according to path status information of a first path for forwarding the first slice, and the first slice includes a first index, where the first index is used by a switch to determine, from among multiple available paths from the first host to a second host, a first path corresponding to the first index;
and the second transceiving unit is used for forwarding the first slice to a second host through the first path.
In a sixth aspect, a switch is provided, the switch comprising a processor, a first port and a second port, the processor being configured to:
controlling the first port to receive a first slice of a first data stream sent by a first host, wherein a last message of the first slice is determined by the first host according to path condition information of a first path for forwarding the first slice, the first slice comprises a first index, and the first index is used for determining a first path corresponding to the first index in a plurality of available paths from the first host to a second host by a switch;
and controlling the second port to forward the first slice to the second host through the first path.
In a seventh aspect, a computer-readable storage medium is provided, which stores a program that causes a host to execute the method of the first aspect or any possible implementation manner of the first aspect.
In an eighth aspect, a computer-readable storage medium is provided, which stores a program that causes a switch to perform the method in the second aspect or any possible implementation manner of the second aspect.
In a ninth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the above aspects.
Drawings
FIG. 1 is a schematic diagram of path congestion under a conventional Clos architecture;
fig. 2 is a schematic diagram of a data transmission method of the CONGA scheme;
FIG. 3 is a schematic block diagram of a 2-stage 3-stage Clos architecture;
FIG. 4 is a schematic block diagram of a 3-stage 5-stage Clos architecture;
FIG. 5 is a schematic diagram of a data transmission method according to an embodiment of the invention;
fig. 6 is a schematic structural diagram of a flow of the CONGA scheme;
FIG. 7 is another diagram illustrating a data transmission method according to an embodiment of the invention;
FIG. 8 is a schematic block diagram of a host according to an embodiment of the present invention;
FIG. 9 is another schematic block diagram of a host according to an embodiment of the present invention;
FIG. 10 is a schematic block diagram of a switch according to an embodiment of the present invention;
fig. 11 is another schematic block diagram of a switch according to an embodiment of the present invention.
Detailed Description
The Clos architecture will be described with reference to the accompanying drawings.
Clos is a network architecture proposed by Charles claus (Charles Clos) in 1953 to solve the problem of non-blocking switching in the telephone network, and the Clos network is mainly characterized by: each stage of switching node is connected with all the switches of the next stage; there is only one connection between the switching node of each stage and the switching node of the next stage. Clos architectures are generally described using stages (tier) and stages (stage), where tier refers to the number of layers of the network; a stage can be understood as the number of devices on any input to any output path from the network.
For example, as shown in FIG. 3, the Clos architecture is an example of a 2-stage (2-tier) 3-stage (3-stage) Clos architecture. The Clos architecture includes a top level device and a bottom level device. Wherein, the lowest layer device may also be called leaf, and the highest layer device may also be called spine. In the Clos architecture, the lowest-level device may also be referred to as an edge Switch (edge Switch), and the highest-level device may also be referred to as a core Switch (core Switch).
It should be understood that fig. 3 is only an exemplary description of the Clos architecture and should not be taken as limiting the Clos architecture to which the present invention is applicable. That is, the Clos architecture used in the present invention can be any stage and any stage Clos architecture, such as 4-stage and 7-stage. For example, as shown in FIG. 4, the Clos architecture is an example of a 3-stage (3-tier) 5-stage (5-stage) Clos architecture. The Clos architecture includes a top level device, a middle level device, and a bottom level device. The middle layer device may also be referred to as an aggregation device (Agg).
In the data transmission method of the embodiment of the invention, when the first host needs to send the first data stream to the second host, the first host determines the transmission path of the first data stream, and adds the index in the first data stream to indicate the transmission path of the first data stream of the switch, so that the problem that the actual transmission path of the first data stream is inconsistent with the path indicated by the host side congestion window can be avoided, and the transmission efficiency of data is further effectively ensured.
Fig. 5 is a schematic flow chart diagram of a method 100 of transmitting data in an embodiment of the present invention.
As shown in fig. 5, the method 100 includes:
the first host adds a first index to each packet of the first slice of the first data stream 110.
The first host sends 120 the first slice of the first data flow to the switch.
The switch forwards 130 the first slice to the second host via the first path.
Specifically, a first host adds a first index to each packet of a first slice of a first data stream, where the first index is used for enabling a switch connected to the first host to determine a first path corresponding to the first index from multiple available paths from the first host to a second host; the first host sends a first slice of the first data flow to the switch, and the last message of the first slice is determined by the first host according to the path condition information of a first path for forwarding the first slice, so that the switch forwards the first slice to the second host through the first path.
It should be understood that the first index in the embodiment of the present invention is used to enable the switch to determine the first path corresponding to the first index in the multiple available paths, that is, the first index may be a path identifier, a port number, or other forms of identifiers, as long as the identifier can serve to identify the first path, and the embodiment of the present invention is not limited in particular.
In 110, the first host adds a first index to each packet of a first slice of a first data stream, where the first index is used for enabling a switch connected to the first host to determine a first path corresponding to the first index among multiple available paths from the first host to a second host. In this way, it is ensured that the path status indicated by the congestion window of the first host is the status of the path corresponding to the first index, that is, the path status of the first path corresponding to the first slice to which the first index is added. That is, the first host can control congestion according to the actual transmission path of the first slice.
Alternatively, the first host may add an index Identifier (ID) field outside the five tuple.
In an embodiment of the present invention, the first data flow may be a set of TCP/IP packets uniquely identified with a specific five-tuple (five key fields). For example, a user initiates a hypertext Transfer Protocol (HTTP) access from a host with Internet Protocol (IP) address 202.100.1.2 to a web server with IP address 100.1.1.2, which is a Transmission Control Protocol (TCP) flow, and the other two five-tuple fields of the TCP flow are source TCP port 20000 and destination TCP port 8080. In other words, this TCP flow passing through the network can be uniquely identified by the quintuple (202.100.1.2+100.1.1.2+ TCP +20000+ 8080).
It should be understood that the above is only an exemplary illustration of the first data stream, and the specific content of the first data stream is not limited by the embodiment of the present invention.
In the embodiment of the present invention, since the index added by the first host in the first data stream is used for the switch to determine the first path corresponding to the first index in the plurality of available paths, it is necessary for the first host to know that several different indexes can be added before adding the index.
Optionally, before the first host adds the first index to each packet of the first slice of the first data stream, the first host needs to obtain a total number of available paths from the first host to the second host, so that the first host determines, according to the total number of the available paths, the number of indexes that can be added in the first data stream.
For example, the first host may perceive an end-to-end equal cost path in the network based on the switch sending the list of ECMP equal cost paths. It is to be understood that the ECMP equal cost path list may be a list of the plurality of available paths from the first host to the second host stored in the switch. In this way, the first host can be enabled to complete the detection of the number of the plurality of available paths.
When the first host obtains the total number of available paths from the first host to the second host, specifically, the first host sends a first TCP link request message to the switch, where the TCP link request message includes the first request message, and the first request message is used to request to obtain the total number of the available paths, so that the switch adds the total number of the available paths in the first TCP link request message, and generates a second TCP link request message that needs to be sent to the second host; and the first host receives a response message of the second TCP link request message forwarded by the switch, wherein the response message comprises the total number of the available paths.
For example, as shown in fig. 7, a Clos architecture includes a highest layer device, a lowest layer device, a first host (10.0.0.2/8) and a second host (40.0.0.2/8), the lowest layer device includes a switch L1(10.0.0.1/8) and a switch L2(40.0.0.1/8), the highest layer device includes a switch S1(20.0.0.1/8) and a switch S2(30.0.0.1/8), and then the available paths from the first host to the second host are 2, which are P10-P11-P21-P22-P41-P40 and P10-P12-P31-P32-P42-P40, respectively.
Referring to fig. 7, switch L1 receives a new TCP SYN flow sent by the first host, and looks up the equal cost path of the next hop from switch L1 to the second host through the route; identifying the total number of equivalent paths 2, marking the equivalent paths in a specific field of the message, and sending the equivalent paths as labels (tags) to a second host; if the second host supports to identify the specific field, the second host carries the tag through the specific field of the TCP ACK message and sends the TCP ACK message carrying the tag back to the first host; and the first host receives the TCP ACK message carrying the tag, extracts the tag from the specific field and obtains the number of the equivalent paths from the first host to the second host in the whole network.
That is, the switch receives a first TCP link request message sent by the first host, where the TCP link request message includes the first request message, and the first request message is used to request to obtain the total number of available paths from the first host to the second host; the exchanger adds the total number of the available paths in the first TCP link request message to form a second TCP link request message; the switch sends the second TCP link request message to the second host; the switch receives a response message of the second TCP link request message sent by the second host, wherein the response message comprises the total number of the available paths; the switch sends the response message to the first host.
It should be noted that in the embodiment of the present invention, the first host may perceive an end-to-end equivalent path in the network based on the advertisement of the ECMP equivalent path sent by the switch. However, the implementation manner of obtaining the multiple available paths is not particularly limited in the embodiment of the present invention.
For example, the first host may also send a request message for requesting the multiple available paths directly to the leaf switch, and the switch, after receiving the request message, sends a response message of the request message directly to the first host to inform the first host of the multiple available paths.
Optionally, before the first host adds the second index to each packet of the second slice of the first data stream, the first host further needs to obtain path status information of the first path. Specifically, after the first host sends a message to the second host, the second host replies a response message to the first host, where the response message includes the path status information of the path through which the message is transmitted. In this embodiment of the present invention, for the first host, the first host may perform slicing according to the path status information of the current path.
For example, the first host receives the path status information of the first path sent by the server corresponding to the second host, and slices the first data stream according to the path status information of the first path.
It should be noted that, in the CONGA scheme, a path matrix table may be established on each leaf switch, where the path matrix table may represent all reachable paths from the leaf switch to other leaf switches, and a congestion degree of each reachable path, and the path matrix table may also be referred to as a path congestion-to-leaf table (conviction-to-leaf).
For example, each row in the path matrix table may represent a leaf switch, and a column in the path matrix table may represent all upstream ports to which the leaf switch corresponds. For example, if a leaf switch has 48 ports, 24 of which are used for upstream spine switches and the entire network has 100 leaf switches, the size of the path table on each leaf switch may be 100 x 24. The value in the path matrix table is the congestion level of the path. For example, if the physical bandwidth of a path is 10Gbps, and the current bearer traffic is 5Gbps, the congestion level of the path is 5/10 — 0.5.
In particular, when the Clos architecture is a 2-stage Clos architecture, the forwarding path is uniquely determined after the upstream port of the leaf switch and the destination switch are determined.
That is, in the CONGA scheme, when a leaf switch receives a data stream, it can determine which row of the path matrix table to look up according to its destination address; and then all the columns of the row are searched, the congestion degree from all the uplink ports to the target switch can be ranked, so that the output port (path) with the lowest congestion degree can be selected for forwarding, and congestion is avoided.
Specifically, a source leaf switch (source leaf) detects a flow, and if the current message belongs to a new flow, a destination leaf switch is searched in a path matrix table; finding all output ports which can be forwarded according to the destination leaf switch; comparing according to the congestion degree of the output ports, and selecting the port with the lowest path congestion degree for forwarding; if the table contents are empty, or congestion level information to the destination leaf switch has not been initially established, then one of the egress ports available for forwarding is randomly selected. And if the current message belongs to the previous flow, finding the output port corresponding to the previous flow for forwarding.
It can be seen from this that the essential idea of the CONGA is that the source leaf switch cuts one flow (flow) into multiple flows, and different flows take different paths, thereby achieving the effect of load balancing of flows of different sizes more finely.
In other words, in the CONGA scheme, the source leaf switch hashes different flows to different paths, and the host side performs congestion control based on the flow, so that a problem that an actual transmission path of the flow is inconsistent with a transmission path of the flow indicated by a congestion window of the host side may occur, and further, the congestion window of the host side may not accurately indicate a congestion state of the actual transmission path of the flow, thereby affecting the transmission efficiency of data.
However, in the embodiment of the present invention, when the first host needs to send the first data stream to the second host, the first host adds an index to each slice of the first data stream, so that the first host can know path status information of a transmission path corresponding to the index in each slice. In other words, the first host can avoid the problem that the actual transmission path of the first data stream is inconsistent with the path indicated by the host side congestion window through the slice corresponding to the index and the path status information of the transmission path corresponding to the index, thereby effectively ensuring the transmission efficiency of the data.
A first host sends a first slice of the first data flow to a switch connected to the first host 120.
Specifically, the first host sends a first slice of the first data stream to the switch, a last packet of the first slice is determined by the first host according to path condition information of a first path for forwarding the first slice, and the first slice includes the first index, so that the switch forwards the first slice to the second host through the first path.
That is, in the embodiment of the present invention, the first host ends sending the first slice when the path status information of the first path indicates that the first path is in a congestion state, if the path status information of the first path indicates that the first path is not congested before the first data stream is sent, the first data stream does not need to be sliced, the first data stream is the first slice, and if the path status information of the first path indicates that the first path is in the congestion state during the first data sending process and the first data stream needs to be sliced again, the first data stream may include multiple slices.
Specifically, after the first host sends a message to the second host, the second host replies a response message to the first host, where the response message includes the path status information of the path through which the message is transmitted. In the embodiment of the present invention, the first slice includes at least one packet.
The switch in the embodiment of the present invention may be any leaf switch under the Clos architecture.
It should be noted that the CONGA scheme is based on flow for transmitting data, where a flow can be understood as a segment of a packet in a TCP flow, and generally, if a time difference between arrival of two previous and next packets is greater than a configured value (e.g., 500 microseconds), the next packet can be determined as a first packet of a new flow; if the difference in time of the first two messages is less than the configured value (e.g., 500 microseconds), then it is determined that the two messages belong to the same flow. Specifically, as shown in fig. 6, flow (flow) a contains 2 flow slices, a1 and a2, respectively, wherein the spacing (Gap) between a1 and a2 is greater than a configured value.
However, in the embodiment of the present invention, the first host determines whether the path for sending the first slice is in a congestion state according to the path status information of the first path, and further determines the sending end time of the first slice, so as to finally achieve the purpose of dynamic slicing, and further avoid the path congestion to the maximum extent.
In addition, the first host dynamically slices the first data stream according to the path congestion condition, so that the slicing times of the first data stream can be effectively reduced.
In the embodiment of the present invention, on one hand, the first host slices the first data stream by adding an index to each slice of the first data stream, so that a problem that a congestion window of a transport layer is not matched with an actual path congestion condition (mismatch) under a single stream slice condition can be avoided to the greatest extent. On the other hand, the first host realizes dynamic slicing according to the congestion status of the first path corresponding to the first slice in the first data flow, so that the first data flow can be effectively transmitted in a path with better path status, and the transmission efficiency is effectively improved.
In the embodiment of the present invention, the first host ends the first slice when the path status information of the first path indicates that the first path is in the congestion state, and then sends the second slice through the second path. That is, the first host needs to detect the path state of the first path in real time or non-real time (periodically).
Optionally, the path condition information in the embodiment of the present invention may include parameters such as round-trip time (RTT) and/or Explicit Congestion Notification (ECN).
For example, the first host may determine the path status of the first path according to parameters such as RTT and/or ECN of the first path. The RTT is an important performance indicator, and indicates a total time delay from when the transmitting end transmits data to when the transmitting end receives an acknowledgement message from the receiving end (the receiving end immediately transmits the acknowledgement message after receiving the data). For ECN, typically, when a host supporting ECN sends a packet, the ECN field in the packet is 01 or 10, if a router on the path supports ECN and experiences congestion, the router modifies the value of the ECN field to 11, and if the field is already set to 11, the downstream router does not modify the value of the field.
For example, if the RTT of the first path exceeds a set threshold and/or the ECN field of the packet is modified, it is determined that the path status information of the first path indicates that the first path is in the congestion state, the first host ends the first slice, that is, the first host performs a slicing operation, uses a subsequent packet of the first data flow as a second slice, and reselects a new path for the second slice among the multiple available paths.
The following describes a data transmission method according to an embodiment of the present invention after the path status information of the first path indicates that the first path is in a congestion state, that is, after the first slice is ended.
Optionally, when the path status information of the first path indicates that the first path is in a congested state, the first host finishes sending the first slice, and after the first slice, adds a second index to each message of a second slice of the first data stream, where the second index is used by the switch to determine, in the multiple available paths, a second path corresponding to the second index, and the first path is different from the second path; the first host sends a second slice to the switch, so that the switch forwards the second slice to the second host through the second path.
In other words, the switch receives a second slice of the first data stream sent by the first host after the first slice, the second slice including a second index; the switch forwards the second slice to a second host via the second path.
In the data transmission method of the embodiment of the present invention, sending the first slice is finished only when the path status information of the first path indicates that the first path is in a congestion state, so that the first host can reduce the number of slices of the first data stream to the maximum extent.
As an embodiment, optionally, the first host may end sending the second slice when the path status information of the second path shows that the second path is in a congested state.
As another embodiment, optionally, after sending the second slice with a predetermined length, the first host ends sending the second slice or continues sending the second slice according to the path condition information of the first path and the path condition information of the second path. For example, the first host may end transmission of the second slice or continue transmission of the second slice according to the path status information of the second path after transmitting the second slice of 64 KB. It should be understood that the predetermined length may be any value that is preset.
In other words, the switch ends or continues to receive the second slice after receiving the second slice of a predetermined length.
For example, when the path status information of the first path indicates that the first path is not in a congestion state, the first host finishes sending the second slice; the first host adds the first index in each message of a third slice of the first data stream after the second slice; the first host sends the third slice of the first data stream to the switch, the third slice including the first index, such that the switch forwards the third slice to the second host via the first route.
For another example, the first host continues to send the second slice when the path status information of the first path indicates that the first path is in a congested state and the path status information of the second path indicates that the second path is not in a congested state.
For another example, when the path status information of the first path indicates that the first path is in a congested state and the path status information of the second path indicates that the second path is in a congested state, the first host ends sending the second slice; the first host adds a third index to each message of a third slice of the first data stream after the second slice, the third index referring to a third path corresponding to the third index determined by the switch among the available paths, wherein the first path, the second path and the third path are different from each other; the first host sends a third slice to the switch, the third slice including the third index, such that the switch forwards the third slice to the second host on the third path.
It should be understood that, in the embodiment of the present invention, the first host may determine to end sending the current slice according to the path condition information of the current path, that is, dynamically slice the data stream according to the path condition information of the current transmission path of the data stream; the data stream may also be statically sliced with a predetermined length, or the data stream may also be sliced in a dynamic and static combined manner, which is not limited in the embodiment of the present invention.
130, the switch forwards the first slice to a second host via the first path.
Specifically, the switch receives a first slice of a first data stream sent by a first host, so that the switch forwards the first slice to the second host through the first path, the first slice includes a first index, the first index is used for determining a first path corresponding to the first index in multiple available paths from the first host to the second host, and the switch forwards the first slice to the second host through the first path.
Optionally, before the switch forwards the first slice to the second host through the first path, the switch establishes a correspondence table, where the correspondence table is used to record an index corresponding to each of the plurality of available paths; and the switch determines the first path according to the corresponding relation table and the first index.
It should be understood that, in the embodiment of the present invention, the switch may establish the correspondence table and then forward the first slice based on the correspondence table. The switch may also forward the first slice in other ways.
For example, in the 2-stage Clos architecture, after receiving the first slice, the switch may directly establish a mapping table between the first slice and an ECMP egress port, and forward the subsequent slices according to the mapping table. Referring to fig. 7, after receiving the first slice including the first index, the switch establishes a mapping relationship between the first index and the egress port P11. Optionally, the switch may establish the mapping relationship between each ECMP egress port and the index at a time, or may establish the mapping relationship between the index and the ECMP egress port according to the index in the slice after receiving the new slice.
For another example, the switch may re-determine the transmission path by round-robin (RR), and forward the first slice to the second host on the re-determined transmission path, and so on.
It should be understood that the RR is merely an exemplary illustration, and alternatively, the RR includes: weighted round robin (weight RR, WRR), deficit round robin (default RR, DRR), and Urgent Round Robin (URR).
It should also be noted that the method of the embodiment of the present invention may be executed by a network processing part (network stack) of the kernel code of the conventional Windows/Linux operating system, or may be directly executed by the host. The network processing part can be, for example, a Linux system TCP/IP protocol stack functional module, a cloud or a TCP/IP protocol stack functional module of a virtualized operating system (VMware, Xen or Openstack), and the like.
Based on the same inventive concept as the method described above, an embodiment of the present invention further provides a host, and fig. 8 is a schematic block diagram of a host 200 according to an embodiment of the present invention.
As shown in fig. 8, the host 200 includes:
a processing unit 210, configured to add a first index to each packet of a first slice of a first data stream, where the first index is used for determining, by a switch, a first path corresponding to the first index among multiple available paths from a first host to a second host;
a transceiving unit 220, configured to send a first slice of the first data stream to the switch, where a last packet of the first slice is determined by the first host according to the path status information of a first path for forwarding the first slice, so that the switch forwards the first slice to the second host through the first path.
Optionally, the processing unit 210 is further configured to:
when the path status information of the first path indicates that the first path is in a congested state, the sending of the first slice is finished, a second index is added to each message of a second slice of the first data stream, where the second index is used by the switch to determine a second path corresponding to the second index in the multiple available paths, and the first path is different from the second path, where the transceiver unit 220 is further configured to:
sending the second slice to the switch such that the switch forwards the second slice to the second host via the second path.
Optionally, the processing unit 210 is further configured to:
and after the second slice with the preset length is sent, finishing sending the second slice or continuing sending the second slice according to the path condition information of the first path and the path condition information of the second path.
Optionally, the processing unit 210 is specifically configured to:
when the path condition information of the first path shows that the first path is not in a congestion state, ending sending the second slice; adding the first index in each message of a third slice of the first data stream after the second slice; wherein, the transceiver unit 220 is further configured to:
sending the third slice to the switch, the third slice including the first index, such that the switch forwards the third slice to the second host via the first route.
Optionally, the processing unit 210 is specifically configured to:
and when the path condition information of the first path shows that the first path is in a congestion state and the path condition information of the second path shows that the second path is not in the congestion state, continuing to send the second slice.
Optionally, the processing unit 210 is specifically configured to:
when the path condition information of the first path shows that the first path is in a congestion state and the path condition information of the second path shows that the second path is in the congestion state, ending sending the second slice; after the second slice, adding a third index in each message of a third slice of the first data stream, where the second index is used for the switch to determine a third path corresponding to the third index in the multiple available paths, and the first path, the second path, and the third path are different from each other; wherein, the transceiver unit 220 is further configured to:
sending the third slice to the switch, the third slice including the third index, such that the switch forwards the third slice to the second host on the third path.
Optionally, the transceiver unit 220 is further configured to:
before adding the first index in each message of the first slice of the first data stream, the first host obtains the total number of the available paths.
Optionally, the transceiver unit 220 is specifically configured to:
sending a first Transmission Control Protocol (TCP) link request message to the switch, wherein the TCP link request message comprises the first request message, and the first request message is used for requesting to acquire the total number of the available paths, so that the switch adds the total number of the available paths in the first TCP link request message to generate a second TCP link request message which needs to be sent to the second host; and receiving a response message of the second TCP link request message forwarded by the switch, wherein the response message comprises the total number of the plurality of available paths.
Optionally, the transceiver unit 220 is further configured to:
and receiving the path condition information of the first path sent by the server before adding the second index in each message of the second slice of the first data stream.
Optionally, the path condition information comprises a round trip delay RTT and/or an explicit congestion notification ECN.
It should be noted that in the embodiment of the present invention, the processing unit 210 may be implemented by a processor, and the transceiver unit 220 may be implemented by a processor control communication port.
As shown in fig. 9, host 300 may include a processor 310, a port 320, and a memory 330. Memory 330 may be used to store, among other things, code, instructions for execution by processor 310.
By way of example, and not limitation, the processor 310, the port 320, and the memory 330 may be communicatively coupled via, for example, a bus.
Based on the same inventive concept as the above method, an embodiment of the present invention further provides a switch, and fig. 10 is a schematic block diagram of a switch 400 according to an embodiment of the present invention.
As shown in fig. 10, the switch 400 includes a first transceiving unit 410 and a second transceiving unit 420.
The switch comprises a first transceiving unit 410, the first transceiving unit 410 being configured to:
a first transceiving unit 410, configured to receive a first slice of a first data stream sent by a first host, where a last packet of the first slice is determined by the first host according to path status information of a first path for forwarding the first slice, where the first slice includes a first index, and the first index is used for a switch to determine, from among multiple available paths from the first host to a second host, a first path corresponding to the first index;
a second transceiving unit 420, configured to forward the first slice to a second host through the first path.
Optionally, the switch further comprises a processing unit, the processing unit being configured to:
establishing a corresponding relation table, wherein the corresponding relation table is used for recording indexes corresponding to each available path in the multiple available paths;
and determining the first path according to the corresponding relation table and the first index.
Optionally, the first transceiver unit 410 is further configured to:
and after the first slice, receiving a second slice of the first data stream sent by the first host, the second slice including a second index, the second index being used for the switch to determine a second path corresponding to the second index among the plurality of available paths, the first path being different from the second path.
Wherein, the second transceiver unit 420 is further configured to: forwarding the second slice to a second host via the second path.
Optionally, the first transceiver unit 410 is further configured to:
after receiving the second slice of a predetermined length, ending or continuing to receive the second slice.
Optionally, the first transceiver unit 410 is further configured to:
before receiving a first slice of a first data stream sent by a first host, receiving a first Transmission Control Protocol (TCP) link request message sent by the first host, wherein the TCP link request message comprises a first request message, and the first request message is used for requesting to acquire the total number of a plurality of available paths from the first host to a second host; adding the total number of the available paths in the first TCP link request message to form a second TCP link request message; wherein, the second transceiver unit 420 is further configured to: sending the second TCP link request message to the second host; receiving a response message of the second TCP link request message sent by the second host, wherein the response message comprises the total number of the available paths; and sending the response message to the first host.
It should be noted that in the embodiment of the present invention, the first transceiver unit 410 and the second transceiver unit 420 may be implemented by different ports controlled by a processor, and the processing unit may be implemented by the processor.
As shown in fig. 11, switch 500 may include a processor 510, a plurality of communication ports 520, and a memory 530. Memory 530 may be used to store code, instructions, etc. that are executed by processor 510, among other things.
By way of example, and not limitation, processor 510, communication port 520, and memory 530 may be communicatively coupled via, for example, a bus.
It should be noted that the method executed by the processor is consistent with the content of the foregoing method embodiment, and is not described again.
It should be noted that the processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA), or other programmable logic device, transistor logic device, or discrete hardware component.
It will be appreciated that in embodiments of the invention, the memory may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, Synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
It is to be understood that the terminology used in the embodiments of the invention and the appended claims is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the invention.
For example, the term "and/or" in the embodiment of the present invention is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist. Specifically, a and/or B may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Also for example, as used in the examples of the invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
Also for example, the terms first, second, third, etc. may be used to describe various slices, indexes, and hosts in embodiments of the invention, but these slices, indexes, and hosts should not be limited to these terms. These terms are only used to distinguish slices, indexes, and hosts from one another.
The above description is only a specific implementation of the embodiments of the present invention, but the scope of the embodiments of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present invention, and all such changes or substitutions should be covered by the scope of the embodiments of the present invention. Therefore, the protection scope of the embodiments of the present invention shall be subject to the protection scope of the claims.

Claims (26)

1. A method of transmitting data, the method comprising:
a first host adds a first index in each message of a first slice of a first data stream, wherein the first index is used for enabling a switch connected with the first host to determine a first path corresponding to the first index in a plurality of available paths from the first host to a second host;
the first host sends a first slice of the first data stream to the switch, and the last message of the first slice is determined by the first host according to the path condition information of a first path for forwarding the first slice, so that the switch forwards the first slice to the second host through the first path;
the method further comprises the following steps:
the first host finishes sending the first slice when the path condition information of the first path shows that the first path is in a congestion state;
adding a second index to each message of a second slice of the first data stream, wherein the second index is used for enabling the switch to determine a second path corresponding to the second index in the plurality of available paths, and the second path is different from the first path;
and the first host sends the second slice to the switch, so that the switch forwards the second slice to the second host through the second path.
2. The method of claim 1, further comprising:
and after the first host sends the second slice with a preset length, finishing sending the second slice or continuing to send the second slice according to the path condition information of the first path and the path condition information of the second path.
3. The method of claim 2, wherein after sending the second slice with a predetermined length, the first host ends sending the second slice or continues sending the second slice according to the path status information of the first path and the path status information of the second path, and wherein the method comprises:
the first host finishes sending the second slice when the path condition information of the first path shows that the first path is not in a congestion state;
the method further comprises the following steps:
the first host adding the first index in each message of a third slice of the first data stream after the second slice;
the first host sends the third slice to the switch, so that the switch forwards the third slice to the second host through the first path.
4. The method of claim 2, wherein after sending the second slice with a predetermined length, the first host ends sending the second slice or continues sending the second slice according to the path status information of the first path and the path status information of the second path, and wherein the method comprises:
and the first host continues to send the second slice when the path condition information of the first path shows that the first path is in a congestion state and the path condition information of the second path shows that the second path is not in the congestion state.
5. The method of claim 2, wherein after sending the second slice with a predetermined length, the first host ends sending the second slice or continues sending the second slice according to the path status information of the first path and the path status information of the second path, and wherein the method comprises:
the first host finishes sending the second slice when the path condition information of the first path shows that the first path is in a congestion state and the path condition information of the second path shows that the second path is in a congestion state;
the method further comprises the following steps:
the first host adds a third index to each message of a third slice of the first data stream after the second slice, the third index referencing a third path corresponding to the third index among the plurality of available paths, the first path, the second path, and the third path being different from each other;
the first host sends the third slice to the switch so that the switch forwards the third slice to the second host on the third path.
6. The method of any of claims 1 to 5, wherein prior to the first host adding the first index in each packet of the first slice of the first data stream, the method further comprises:
the first host obtains the total number of the plurality of available paths, so that the first host determines the number of indexes which can be added in the first data stream according to the total number of the plurality of available paths.
7. The method of claim 6, wherein obtaining, by the first host, the total number of the plurality of available paths comprises:
the first host sends a first Transmission Control Protocol (TCP) link request message to the switch, wherein the TCP link request message comprises a first request message, and the first request message is used for requesting to acquire the total number of the available paths, so that the switch adds the total number of the available paths in the first TCP link request message to generate a second TCP link request message which needs to be sent to the second host;
and the first host receives a response message of the second TCP link request message forwarded by the switch, wherein the response message comprises the total number of the available paths.
8. The method of any of claims 1 to 5, wherein prior to the first host adding a second index in each packet of a second slice of the first data stream, the method further comprises:
and the first host receives the path condition information of the first path sent by the server corresponding to the second host through the switch.
9. A first host, comprising:
the processing unit is used for adding a first index into each message of a first slice of a first data stream, wherein the first index is used for determining a first path corresponding to the first index in a plurality of available paths from a first host to a second host by a switch;
a transceiving unit, configured to send a first slice of the first data stream to the switch, where a last packet of the first slice is determined by the first host according to path status information of a first path for forwarding the first slice, and the first slice includes the first index, so that the switch forwards the first slice to the second host through the first path;
the processing unit is further to:
when the path condition information of the first path shows that the first path is in a congestion state, ending sending the first slice;
after the first slice, adding a second index to each message of a second slice of the first data flow, where the second index is used by the switch to determine a second path corresponding to the second index among the multiple available paths, and the second path is different from the first path, where the transceiver unit is further configured to:
and sending the second slice to the switch so that the switch forwards the second slice to the second host through the second path.
10. The first host of claim 9, wherein the processing unit is further configured to:
and after the second slice with the preset length is sent, finishing sending the second slice or continuing sending the second slice according to the path condition information of the first path and the path condition information of the second path.
11. The first host of claim 10, wherein the processing unit is specifically configured to:
when the path condition information of the first path shows that the first path is not in a congestion state, ending sending the second slice;
adding the first index in each message of a third slice of the first data stream after the second slice;
wherein the transceiver unit is further configured to:
sending a third slice to the switch, the third slice including the first index, so that the switch forwards the third slice to the second host through the first path.
12. The first host of claim 10, wherein the processing unit is specifically configured to:
and when the path condition information of the first path shows that the first path is in a congestion state and the path condition information of the second path shows that the second path is not in the congestion state, continuously sending the second slice.
13. The first host of claim 10, wherein the processing unit is specifically configured to:
when the path condition information of the first path shows that the first path is in a congestion state and the path condition information of the second path shows that the second path is in the congestion state, ending sending the second slice;
after the second slice, adding a third index to each message of a third slice of the first data stream, where the second index is used by the switch to determine a third path corresponding to the third index among the multiple available paths, and the first path, the second path, and the third path are different from each other;
wherein the transceiver unit is further configured to:
sending a third slice to the switch, the third slice including the third index, such that the switch forwards the third slice to the second host on the third path.
14. The first host of any one of claims 9 to 13, wherein the transceiver unit is further configured to:
before adding a first index into each message of a first slice of a first data stream, acquiring the total number of the available paths, so that the first host determines the number of indexes which can be added in the first data stream according to the total number of the available paths.
15. The first host of claim 14, wherein the transceiver unit is specifically configured to:
sending a first Transmission Control Protocol (TCP) link request message to the switch, wherein the TCP link request message comprises a first request message, and the first request message is used for requesting to acquire the total number of the available paths, so that the switch adds the total number of the available paths in the first TCP link request message, and generates a second TCP link request message which needs to be sent to the second host;
and receiving a response message of the second TCP link request message forwarded by the switch, wherein the response message comprises the total number of the plurality of available paths.
16. The first host of any one of claims 9 to 13, wherein the transceiver unit is further configured to:
before adding a second index to each message of a second slice of the first data stream, receiving, by the switch, path status information of the first path sent by a server corresponding to the second host.
17. A method of transmitting data, the method comprising:
a first host adds a first index in each message of a first slice of a first data stream, wherein the first index is used for enabling a switch connected with the first host to determine a first path corresponding to the first index in a plurality of available paths from the first host to a second host;
the first host sends a first slice of the first data flow to the switch, and the last message of the first slice is determined by the first host according to the path condition information of a first path for forwarding the first slice;
the switch forwards the first slice to the second host through the first path;
the method further comprises the following steps:
the first host finishes sending the first slice when the path condition information of the first path shows that the first path is in a congestion state;
the first host adds a second index to each message of a second slice of the first data stream, wherein the second index is used for enabling the switch to determine a second path corresponding to the second index in the plurality of available paths, and the second path is different from the first path;
the first host sending the second slice to the switch;
and the switch forwards the second slice to the second host through the second path.
18. The method of claim 17, further comprising:
and after the first host sends the second slice with a preset length, finishing sending the second slice or continuing to send the second slice according to the path condition information of the first path and the path condition information of the second path.
19. The method of claim 18, wherein after sending the second slice with a predetermined length, the first host ends sending the second slice or continues sending the second slice according to the path condition information of the first path and the path condition information of the second path, and wherein the method comprises:
the first host finishes sending the second slice when the path condition information of the first path shows that the first path is not in a congestion state;
the method further comprises the following steps:
the first host adding the first index in each message of a third slice of the first data stream after the second slice;
the first host sends the third slice to the switch;
the switch forwards the third slice to the second host through the first path.
20. The method of claim 18, wherein after sending the second slice with a predetermined length, the first host ends sending the second slice or continues sending the second slice according to the path condition information of the first path and the path condition information of the second path, and wherein the method comprises:
and the first host continues to send the second slice when the path condition information of the first path shows that the first path is in a congestion state and the path condition information of the second path shows that the second path is not in the congestion state.
21. The method of claim 18, wherein after sending the second slice with a predetermined length, the first host ends sending the second slice or continues sending the second slice according to the path condition information of the first path and the path condition information of the second path, and wherein the method comprises:
the first host finishes sending the second slice when the path condition information of the first path shows that the first path is in a congestion state and the path condition information of the second path shows that the second path is in a congestion state;
the method further comprises the following steps:
the first host adds a third index to each message of a third slice of the first data stream after the second slice, the third index referencing a third path corresponding to the third index among the plurality of available paths, the first path, the second path, and the third path being different from each other;
the first host sends the third slice to the switch;
the switch forwards the third slice to the second host on the third path.
22. The method of any of claims 17 to 21, wherein prior to the first host adding the first index in each packet of the first slice of the first data stream, the method further comprises:
the first host obtains the total number of the plurality of available paths, so that the first host determines the number of indexes which can be added in the first data stream according to the total number of the plurality of available paths.
23. The method of claim 22, wherein obtaining the total number of the plurality of available paths by the first host comprises:
the first host sends a first Transmission Control Protocol (TCP) link request message to the switch, wherein the TCP link request message comprises a first request message, and the first request message is used for requesting to acquire the total number of the available paths;
the switch adds the total number of the multiple available paths in the first TCP link request message to generate a second TCP link request message which needs to be sent to the second host;
the switch sends the second TCP link request message to the second host;
the switch receives a response message of the second TCP link request message sent by the second host, wherein the response message comprises the total number of the available paths;
and the switch sends the response message to the first host.
24. The method of any of claims 17 to 21, wherein prior to the first host adding a second index in each packet of a second slice of the first data stream, the method further comprises:
and the first host receives the path condition information of the first path sent by the server corresponding to the second host through the switch.
25. The method of any of claims 17 to 21, wherein before the switch forwards the first slice to the second host via the first route, the method further comprises:
the switch establishes a corresponding relation table, and the corresponding relation table is used for recording indexes corresponding to each available path in the multiple available paths;
and the switch determines the first path according to the corresponding relation table and the first index.
26. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a program that, when executed, causes a host to execute the method of transmitting data according to any one of claims 1 to 8.
CN201710359609.1A 2017-05-19 2017-05-19 Method, host and switch for transmitting data Active CN108965121B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710359609.1A CN108965121B (en) 2017-05-19 2017-05-19 Method, host and switch for transmitting data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710359609.1A CN108965121B (en) 2017-05-19 2017-05-19 Method, host and switch for transmitting data

Publications (2)

Publication Number Publication Date
CN108965121A CN108965121A (en) 2018-12-07
CN108965121B true CN108965121B (en) 2021-06-01

Family

ID=64462122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710359609.1A Active CN108965121B (en) 2017-05-19 2017-05-19 Method, host and switch for transmitting data

Country Status (1)

Country Link
CN (1) CN108965121B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10897423B2 (en) * 2019-05-14 2021-01-19 Vmware, Inc. Congestion avoidance in a slice-based network
US11588733B2 (en) 2019-05-14 2023-02-21 Vmware, Inc. Slice-based routing
US10892994B2 (en) 2019-05-14 2021-01-12 Vmware, Inc. Quality of service in virtual service networks
US11012288B2 (en) 2019-05-14 2021-05-18 Vmware, Inc. Congestion avoidance in a slice-based network
CN113810284A (en) * 2020-06-16 2021-12-17 华为技术有限公司 Method and device for determining message sending path
CN111817973B (en) * 2020-06-28 2022-03-25 电子科技大学 Data center network load balancing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102006225A (en) * 2010-11-19 2011-04-06 华为技术有限公司 Network congestion processing method and device
CN105873162A (en) * 2016-06-20 2016-08-17 沈阳化工大学 Wireless sensor network data flow rate shunting routing method based on multipath
CN106059941A (en) * 2016-07-14 2016-10-26 电子科技大学 Backbone network traffic scheduling method for eliminating link congestion

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5640982B2 (en) * 2009-09-14 2014-12-17 日本電気株式会社 COMMUNICATION SYSTEM, TRANSFER NODE, ROUTE MANAGEMENT SERVER, COMMUNICATION METHOD, AND PROGRAM
US9350665B2 (en) * 2012-08-31 2016-05-24 Cisco Technology, Inc. Congestion mitigation and avoidance
CN104579961B (en) * 2013-10-11 2018-09-07 ***通信集团公司 The dispatching method and device of data message
CN105933232B (en) * 2016-03-29 2018-10-23 东北大学 Support the Multipath Transmission control terminal and method of multi-service data transmission demand
CN106357547A (en) * 2016-09-08 2017-01-25 重庆邮电大学 Software-defined network congestion control algorithm based on stream segmentation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102006225A (en) * 2010-11-19 2011-04-06 华为技术有限公司 Network congestion processing method and device
CN105873162A (en) * 2016-06-20 2016-08-17 沈阳化工大学 Wireless sensor network data flow rate shunting routing method based on multipath
CN106059941A (en) * 2016-07-14 2016-10-26 电子科技大学 Backbone network traffic scheduling method for eliminating link congestion

Also Published As

Publication number Publication date
CN108965121A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
CN108965121B (en) Method, host and switch for transmitting data
CN109314666B (en) Method and virtual tunnel endpoint for performing congestion-aware load balancing
US10193810B2 (en) Congestion-aware load balancing
US10924413B2 (en) Transmission path determining method and apparatus
US10050809B2 (en) Adaptive load balancing for single active redundancy using EVPN designated forwarder election
CN109691037B (en) Method and system for data center load balancing
EP2958280B1 (en) Routing based on the content of packets
US9608938B2 (en) Method and system for tracking and managing network flows
US8259585B1 (en) Dynamic link load balancing
CN105850082B (en) Method for segmenting source routing in a network and storage medium
WO2018058677A1 (en) Message processing method, computing device, and message processing apparatus
EP3035638A1 (en) Interest acknowledgements for information centric networking
US10291536B2 (en) Tiered services in border gateway protocol flow specification
US20240121203A1 (en) System and method of processing control plane data
WO2018005303A1 (en) Method and system for interest groups in a content centric network
CN108306827B (en) Data transmission method and server
US10778568B2 (en) Switch-enhanced short loop congestion notification for TCP
US9641441B2 (en) Learning information associated with shaping resources and virtual machines of a cloud computing environment
US20180109401A1 (en) Data transfer system, data transfer server, data transfer method, and program recording medium
WO2015039616A1 (en) Method and device for packet processing
US10033642B2 (en) System and method for making optimal routing decisions based on device-specific parameters in a content centric network
US7859997B2 (en) Packet handling on a network device
US11627093B1 (en) Generic layer independent fragmentation of non-internet protocol frames
US10686712B2 (en) Communication apparatus, control apparatus, communication system, received packet processing method, communication apparatus control method, and program
WO2017211211A1 (en) Packet forwarding method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant