CN108337183B - Method for balancing network flow load of data center - Google Patents

Method for balancing network flow load of data center Download PDF

Info

Publication number
CN108337183B
CN108337183B CN201711372360.4A CN201711372360A CN108337183B CN 108337183 B CN108337183 B CN 108337183B CN 201711372360 A CN201711372360 A CN 201711372360A CN 108337183 B CN108337183 B CN 108337183B
Authority
CN
China
Prior art keywords
port number
header
field
random number
packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711372360.4A
Other languages
Chinese (zh)
Other versions
CN108337183A (en
Inventor
唐艺舟
田臣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201711372360.4A priority Critical patent/CN108337183B/en
Publication of CN108337183A publication Critical patent/CN108337183A/en
Application granted granted Critical
Publication of CN108337183B publication Critical patent/CN108337183B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering

Abstract

The invention discloses a method for balancing network flow load of a data center, which utilizes an ECMP method widely deployed in network equipment to respectively process each data packet in the same network flow at a host sending party and a host receiving party, so that the packets are uniformly dispersed on different equivalent links to realize load balancing with finer granularity; the processing method of the host sender is to generate a random number for each packet, modify the unique determined tuple according to the random number, and embed the random number into a certain field of the packet header; the processing method of the host receiver is to extract a random number from a field of a packet header and restore a unique determined tuple. The invention can effectively relieve the defect of poor load balancing capability of the ECMP method when network flows with different sizes coexist or hash collision occurs, thereby achieving the optimal load balancing effect and having the characteristics of simple implementation method and high operation efficiency.

Description

Method for balancing network flow load of data center
Technical Field
The invention belongs to the field of computer networks, and particularly relates to a method for balancing network flow load deployed in a computer data center, in particular to a method for balancing network flow load of a data center.
Background
With the continuous development of society, data centers have become an indispensable part of our daily lives. The data center provides back-end support for Web application and mobile phone application, and serves the fields of cloud computing, social contact, payment, entertainment and the like. A network flow refers to a sequence of data packets transmitted using a network protocol stack (usually a TCP/IP protocol stack), and each flow can be uniquely determined by a tuple in the data packet, and the most common tuple is (source IP, destination IP, source port, destination port). Network flow load balancing has always been one of the important problems of data centers, because there are servers serving various applications in the data centers, and their demands for network flows are often inconsistent, such as low latency is required for searching and live broadcasting; storage backups require high throughput. The load balancing is that the applications can obtain a proper amount of network service, and the situation that the applications occupy excessive network resources or are starved due to the fact that the network resources are not obtained is avoided.
The most common network flow load balancing method at present is "Equal-cost Multi-path Routing" (ECMP), and the main idea thereof is briefly introduced here. Since the data center network is usually a "Fat Tree" (Fat Tree) structure, there are multiple communication links with the same cost and different paths between two servers, and they have a redundant backup relationship with each other. If the network flows between the two servers are uniformly dispersed in the links, namely, the phenomenon that some link is overloaded and other links are idle is avoided, the purpose of load balancing is achieved. The method of uniform dispersion employed by ECMP is a hash algorithm. The basic idea of the hash algorithm is to map an arbitrary size of input data into a fixed size of output range. If a uniquely determined tuple of a network flow is used as an input of the hash algorithm, and a set of the total codes of the transmission links is used as an output range of the hash algorithm, a mapping relation is formed. As long as the nature of the selected hash algorithm is good enough, it can be guaranteed that different network flows are evenly dispersed in each link. ECMP is an active, stateless method, can run hop-by-hop (per-hop) in routers and switches, is simple to implement, and is therefore widely used.
The ECMP method has two major problems. First, if the network flows are of different sizes, the effect of load balancing is affected. Since the ECMP method performs load balancing for different network flows, it is obviously unbalanced if one flow has a large transmission amount and the other flow has a small transmission amount even if they are distributed on different links. Second, hash collision problem. In the event of a hash collision, the different network flows cannot be evenly dispersed. This problem, although alleviated by the selection of a better performing hashing algorithm, is not eradicated.
Network flow load balancing is often implemented by relying on network packet processing techniques, such as extracting and modifying field information in packet headers to affect packet forwarding links. There are three common methods for packet processing: in a virtual machine monitor; in the virtual switch Open vSwitch; in the Linux kernel network protocol stack. Network card offloading (offloading) is one of the technologies that are widely used in current data center networks. When the network card starts the function, the Linux kernel network protocol stack can process data packets with the Size of nearly 64KB at most at one time, instead of the traditional maximum Segment length (MSS, Max Segment Size)1460 byte packets. The sub-package and the calculation of the check code are finished by the network card, so that the load of a CPU is reduced, and the expandability of the whole system in a high-speed data center network is facilitated.
If a network flow is divided into segments in some way, each segment can be referred to as a sub-flow of the original network flow. In the invention, a large packet processed by the Linux kernel network protocol stack once after the network card is opened for unloading and a plurality of packets obtained by dividing the large packet by the subsequent network card are called sub-streams.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method for balancing network flow load in a data center, which can alleviate the defects when network flows of different sizes coexist simultaneously or when hash collision occurs to a great extent and has the capability of being actually deployed in the data center, in view of the above-mentioned current state of the prior art.
In order to achieve the technical purpose, the technical scheme adopted by the invention is as follows:
a method for balancing network flow load of a data center is characterized in that: by utilizing an ECMP method widely deployed in network equipment, a host sender and a host receiver respectively process each data packet in the same network flow, so that the packets are uniformly dispersed on different equivalent links to realize load balancing with finer granularity; the processing method of the host sender is to generate a random number for each packet, modify the unique determined tuple according to the random number, and embed the random number into a certain field of the packet header; the processing method of the host receiver is to extract a random number from a field of a packet header and restore a unique determined tuple.
In order to optimize the technical scheme, the specific measures adopted further comprise:
the processing method of the host sender specifically comprises the following steps:
A1) acquiring a data packet to be sent by using a Linux kernel network protocol stack;
A2) acquiring a source port number and a destination port number of a data packet;
A3) generating a random number;
A4) recalculating a new source port number and a new destination port number;
A5) the generated random number, the new source port number and the new destination port number are written back to the packet.
The processing method of the host receiver specifically comprises the following steps:
B1) acquiring a data packet to be received by utilizing a Linux kernel network protocol stack;
B2) acquiring a generated random number, a new source port number and a new destination port number of a write-back data packet;
B3) calculating and restoring the original source port number and the original destination port number;
B4) and writing the restored source port number and destination port number back to the data packet.
The Linux kernel network protocol stack is used IN the step a1) to obtain a data packet to be sent, and a callback function of the NF _ INET _ LOCAL _ IN hook of the Netfilter frame is required to be automatically realized by the data packet to be sent; the resulting packet is represented in the form of a sk _ buff structure.
If the transport layer protocol is TCP in the step a2), obtaining a pointer pointing to the TCP header in skbuff through a TCP _ hdr function, and obtaining the source port number and the destination port number of the packet by using a source field and a dest field according to the definition of the TCP header tcphdr structure;
if the transport layer protocol is UDP, acquiring a pointer pointing to a UDP header in the skbuf through a UDP _ hdr function, and acquiring a source port number and a destination port number of the data packet by using a source field and a dest field respectively according to the definition of a UDP header UDP structure;
the step a3) above uses the true random number generating function get _ random _ bytes provided by the Linux kernel network protocol stack, which can specify the number of bytes of the obtained random number occupying the storage space.
The new source port number and the new destination port number in the step a4) are calculated by taking the lower 6 bits bit of the random number, and performing xor operation with the lower 6 bits of the two port numbers to obtain two new port numbers, where the two port numbers will change within the valid range [0,65535 ].
If the transport layer protocol is TCP in the step a5), writing the new source port number and the new destination port number back to the source field and the dest field of the tcphdr structure; the generated random number is placed in the differentiated services field of the IP header of the packet header.
If the transport layer protocol is UDP, writing the new source port number and the new destination port number back to the source field and the dest field of the UDP dr structure body; the generated random number is placed in the differentiated services field of the IP header of the packet header.
The data packet to be received obtained in the step B1) is a callback function of the NF _ INET _ LOCAL _ OUT hook of the Netfilter framework automatically realized in the Linux kernel network protocol stack;
if the transport layer protocol is TCP in the step B2), obtaining a pointer pointing to the TCP header in skbuff through a TCP _ hdr function, and obtaining a new source port number and a new destination port number of the packet by using a source field and a dest field according to the definition of the TCP header tcphdr structure; acquiring a pointer pointing to an IP header in the skbuff through an IP _ hdr function, and acquiring a random number from a tos field of an IP header iphdr structure;
if the transport layer protocol is UDP, acquiring a pointer pointing to a UDP header in the skbuf through a UDP _ hdr function, and acquiring a source port number and a destination port number of the data packet by using a source field and a dest field respectively according to the definition of a UDP header UDP structure; acquiring a pointer pointing to an IP header in the skbuff through an IP _ hdr function, and acquiring a random number from a tos field of an IP header iphdr structure;
in the step B3), the original source port number and destination port number can be calculated and restored by performing xor on the lower 6 bits of the new source port number and new destination port number and the random number again;
above step B4), if the transport layer protocol is TCP, the source port number and the destination port number after being restored are written back to the source field and the dest field of the tcphdr structure.
If the transport layer protocol is UDP, the restored source port number and destination port number are written back to the source field and the dest field of the UDP dr structure.
Compared with the prior art, the invention fully utilizes the ECMP method widely deployed in the network equipment, and the router and the switch find that the unique determined tuples of each packet or sub-flow of the same flow are different and evenly disperse the packets or sub-flows in different equivalent links. The drawback of ECMP methods dealing with network flows of varying sizes is alleviated because each flow is divided into fine-grained units and the original flow size is no longer important. The defect of the ECMP method for processing the Hash collision is also alleviated, because the method does not have the Hash collision with flow granularity, only has the Hash collision with fine-grained units, and even if the Hash collision occurs, the generated damage is also greatly weaker than that of the original ECMP method.
The unique tuple is modified at the sender at the host side and is restored at the receiver, so that the application layer protocol cannot sense that the unique tuple of the stream is changed once, and therefore, no modification is needed.
By adopting the technical scheme, the invention has the following advantages:
1. the defect of poor load balancing capability of the ECMP method when network flows of different sizes coexist or hash collision occurs is overcome, and a better load balancing effect is achieved.
2. The software is simple to realize and the running efficiency is high.
3. The ECMP method which is already deployed in the network equipment in a large scale is fully utilized, only the bottom layer software of the host end needs to be simply upgraded, the host end application and the network equipment do not need to be modified, and the feasibility of deployment in a real data center is achieved.
4. Due to the fact that network card unloading is considered in design, the method can be operated in a data center network with 10Gbps, 25Gbps or even higher bandwidth, and has expandability.
Drawings
FIG. 1 shows the location of the present invention implemented in the Linux kernel network protocol stack.
Fig. 2 is a flow chart of the present invention implemented on the host sender side.
FIG. 3 is a flow chart of the present invention implemented at the host receiver.
Detailed Description
Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
The following two terms, packet and sub-stream, are usually interchangeable, and are the finer-grained units after the network stream is divided.
If the equipment IN the data center starts network card unloading, the custom callback functions of the NF _ INET _ LOCAL _ IN and NF _ INET _ LOCAL _ OUT hooks generally acquire big packets close to 64KB, and the big packets can be considered as load balancing of sub-streams; if network card unloading is not started for the old device to be compatible, the load balancing is carried out on the packet. The network card unloading only influences the expandability of the high-speed data center network and does not influence the core principle.
Fig. 1 to fig. 3 are schematic diagrams illustrating the operation principle of the present invention, and as shown in the diagrams, the method for balancing load of network flows in a data center of the present invention uses an ECMP method widely deployed in a network device to process each data packet or sub-flow in the same network flow at a host sending party and a host receiving party, so that the packets are uniformly distributed on different equivalent links to achieve load balancing with finer granularity; the processing method of the host sender is to generate a random number for each packet, modify the unique determined tuple according to the random number, and embed the random number in a certain field of the packet header; the processing method of the host receiver is to extract a random number from a field of a packet header and restore a unique determined tuple.
The processing method of the host sender specifically comprises the following steps:
A1) acquiring a data packet to be sent by using a Linux kernel network protocol stack;
A2) acquiring a source port number and a destination port number of a data packet;
A3) generating a random number;
A4) recalculating a new source port number and a new destination port number;
A5) the generated random number, the new source port number and the new destination port number are written back to the packet.
The processing method of the host receiver specifically comprises the following steps:
B1) acquiring a data packet to be received by utilizing a Linux kernel network protocol stack;
B2) acquiring a generated random number, a new source port number and a new destination port number of a write-back data packet;
B3) calculating and restoring the original source port number and the original destination port number;
B4) and writing the restored source port number and destination port number back to the data packet.
IN the step A1), a Linux kernel network protocol stack is utilized to obtain a data packet to be sent, and a callback function of an NF _ INET _ LOCAL _ IN hook of a Netfilter framework which needs to be realized by the data packet to be sent is obtained; the resulting packet is represented in the form of a sk _ buff structure.
In step A2), if the transport layer protocol is TCP, acquiring a pointer pointing to a TCP header in skbuff through a TCP _ hdr function, and acquiring a source port number and a destination port number of a data packet by using a source field and a dest field respectively according to the definition of a TCP header tcphdr structure;
if the transport layer protocol is UDP, acquiring a pointer pointing to a UDP header in the skbuff through a UDP _ hdr function, and acquiring a source port number and a destination port number of the data packet by using a source field and a dest field respectively according to the definition of a UDP header UDP structure body.
In step a3), the true random number generating function get _ random _ bytes provided by the Linux kernel network protocol stack is used, which can specify the number of bytes of the storage space occupied by the obtained random number.
The method for calculating the new source port number and the new destination port number in the step a4) of the present invention is to take the lower 6 bits bit of the random number, and perform xor operation with the lower 6 bits of the two port numbers respectively to obtain the new two port numbers, and the two port numbers will change in the effective range [0,65535 ].
In step a5) of the present invention,
if the transmission layer protocol is TCP, writing the new source port number and the new destination port number back to the source field and the dest field of the tcphdr structure body; the generated random number is placed in the differentiated services field of the IP header of the packet header.
If the transport layer protocol is UDP, writing the new source port number and the new destination port number back to the source field and the dest field of the UDP dr structure body; the generated random number is placed in an additional field of the packet header or in a differentiated services field of the IP header.
Acquiring a data packet to be received in the step B1), namely, automatically realizing a callback function of an NF _ INET _ LOCAL _ OUT hook of a Netfilter frame in a Linux kernel network protocol stack;
if the transport layer protocol is TCP in the step B2), acquiring a pointer pointing to a TCP header in skbuf through a TCP _ hdr function, and acquiring a new source port number and a new destination port number of the data packet by using a source field and a dest field respectively according to the definition of a TCP header tcphdr structure; acquiring a pointer pointing to an IP header in the skbuff through an IP _ hdr function, and acquiring a random number from a tos field of an IP header iphdr structure;
if the transport layer protocol is UDP, acquiring a pointer pointing to a UDP header in the skbuf through a UDP _ hdr function, and acquiring a source port number and a destination port number of the data packet by using a source field and a dest field respectively according to the definition of a UDP header UDP structure; acquiring a pointer pointing to an IP header in the skbuff through an IP _ hdr function, and acquiring a random number from a tos field of an IP header iphdr structure;
in step B3), the original source port number and destination port number can be calculated and restored by performing xor on the lower 6 bits of the new source port number and new destination port number and the random number;
step B4) of the present invention
If the transport layer protocol is TCP, the restored source port number and destination port number are written back to the source field and the dest field of the tcphdr structure.
If the transport layer protocol is UDP, the restored source port number and destination port number are written back to the source field and the dest field of the UDP dr structure.
The invention introduces random factors, namely random numbers, into each divided network flow unit, and the host sender modifies the unique determined tuple according to the random factors, and the defects of the ECMP method when the network flow is different in size or has hash collision can be relieved by using the ECMP method deployed in large scale in network equipment to perform fine-grained load balancing. The host receiver recovers the unique determined tuples of each unit without any modification by the upper layer protocol.
The present invention requires modification of the data packets of the sender and receiver of the host. When the method is implemented in a Linux kernel network protocol stack, a Netfilter framework can be used. As shown in fig. 1, the Netfilter framework provides 5 hooks at the IP layer of the network protocol stack to intercept and process the data packets flowing through, and the arrow indicates the flowing direction of the data packets. The invention selects two hooks of NF _ INET _ LOCAL _ IN and NF _ INET _ LOCAL _ OUT as implementation positions for the sender and the receiver respectively, so as to process necessary data packets only. The two hooks are located closer to the transport layer protocol and do not need to process packets forwarded through NF _ INET _ FORWARD. Through the callback function of the custom hook, a sender and a receiver can perform bit-level processing on the packet, because the data packet is presented in the form of a sk _ buff structure, and the data packet can be accessed through a field of an access structure, for example, a source field and a dest field can be respectively used for acquiring a transmission layer source port number and a destination port number of the packet. Meanwhile, which types of data packets need to be processed can be accurately controlled according to the actual application condition, and all data packets passing through the two hooks are simply considered to be processed in the invention. In a Linux kernel, the number of a hook is specified by specifying a field of hooknum of an nf _ hook _ ops structure, and then which hook is to be used is determined; binding the callback function and the hook is completed by designating the function pointer of the hook field as the self-defined callback function; and finally, registering an nf _ hook _ ops structure by using an nf _ register _ hook function, so that the data packet is intercepted by a hook when passing through a network protocol stack, and the subsequent processing flows of the sender and the receiver are completed in a custom callback function, wherein the main steps can be respectively referred to fig. 2 and fig. 3.
The network card unloading influences the size of a data packet in a kernel network protocol stack, namely the size of the sk _ buff structure. Currently, network card Offload commonly used by a host sender includes TCP Segment Offload (TSO) and General Segment Offload (GSO). The main difference between TSO and GSO is that the latter supports both TCP and UDP protocols, but the former only supports TCP protocol. The network card unloading corresponding to the host receiver is general reception unloading (GRO), which can merge multiple small packets into a large packet close to 64KB in an effort, so that the CPU can process the large packet at one time conveniently, and the network card unloading is the inverse process of TSO and GSO. Whether the network card is started to unload or not, and when the network card is started to unload, the TSO or the GSO is selected without any influence on the method, for example, a Linux kernel protocol stack, only fields required by the sk _ buff structure body need to be accessed during programming, and the size of the sk _ buff does not need to be concerned. The system for starting the network card unloading can be better deployed in a data center network with 10Gbps, 25Gbps or even higher speed due to the reduction of the load of a CPU, so the system has expandability in the high-speed data center network and can be compatible with the system which does not support the network card unloading.
The generation of a random number for each packet by the host sender serves the unique deterministic tuple of the subsequent modified packet. In order to facilitate the host receiver to restore the uniquely determined tuple, this random number needs to be stored somewhere in the packet. The principle of generating random numbers by the Linux kernel is that due to various noises existing in the running of a computer, such as the time for users to click on a keyboard, move and click on a mouse, interrupt generated by hardware and the like, the Linux kernel uses the random numbers to generate high-quality true random number sequences and provides the following function prototypes for programming: void get _ random _ bytes (void buf, int nbytes). The function returns the buffer buf of nbytes bytes, which is a random quantity of controllable length. If the invention is implemented in other ways, different random number generators may be used, but a true random number generator theoretically works best.
The purpose of the host sender to recalculate the new source and destination port numbers is to modify the uniquely determined tuple of the packet. The present invention defines that the only certain tuple of a packet is (source IP, destination IP, source port, destination port). Modifying the source IP and the destination IP therein is not good because it may result in the data packet not being accurately sent to the host; the meaning of the source port and the destination port is local, is only useful for a single host and is suitable for modification. Because the ECMP method deployed by the network equipment takes the unique determined tuple as the input of the hash algorithm, on the premise that each packet divided by the same network flow generates a unique random number respectively, the unique determined tuple of the packet is randomized and disturbed, and the packets are dispersed on communication links with the same cost and different paths, so that the load balance is realized. It can be seen here that the scrambling effect is better if true random number sequences are used.
The modification mode of the unique determined tuple can be various and can be completed in the Netfilter frame hook self-defined callback function. The modification mode adopted by the invention is to carry out XOR operation on the random number and the port number by comprehensively considering convenience and effectiveness. The exclusive-or operation # has two excellent properties. First, if there are A, B numbers, a ^ B ^ a. If A is the port number and B is the random number, then A ^ B is the modified port number, and after the receiver extracts B, the A can be very simply restored by performing the XOR operation once more. Secondly, the XOR operation can be operated according to binary digits in a computer, the efficiency is high, the operation result can not generate carry, and the port number can not overflow. It is noted that other methods of determining tuples uniquely by reasonable modification are possible without departing from the key idea of the invention. The length of the source port number and the destination port number are both 2 bytes, namely 16 bits, so that the sender can intuitively generate random numbers with the size of 16 bits, and the invention selects 6-bit random numbers for two reasons. First, the random number is finally stored in a certain field of the packet header to be extracted by the receiving party, but the fields available for the TCP/IP protocol are few, and the storage of the random number is not favorable if the random number is too large. One reasonable field is the differentiated services of the IP header, which can be accessed through the tos field of the iphdr structure. The invention has a byte size, the first 6 bits are used for QoS (Quality of Service), the second 2 bits are used for ECN (Explicit Congestion Notification) protocol, and the first 6 bits are regarded as a position for storing random numbers. Other sizes of random numbers, other fields in the header, and so on may be possible for different applications and protocols without departing from the key idea of the invention. Second, it is sufficient to scramble the uniquely determined tuples using 6-bit random numbers. After XOR, the value of the port number will be 26Ranging from 64. If the nature of the hash function chosen for the network ECMP method is good enough, a small input change will produce a completely different output value,i.e. the packets are spread over completely different links. In addition, the invention modifies the source port number and the destination port number simultaneously, and increases the change of the unique determination tuple.
Both major drawbacks of the ECMP method will be greatly alleviated. First, when network flows are different in size, because the load balancing granularity of the ECMP method is a single flow, the load of a link through which a large flow passes is higher than that of a link through which a small flow passes, resulting in a poor load balancing effect. The present invention has the advantage that the granularity of load balancing is each packet after the network flow is divided, and the packets are usually equal in size, and after being processed by the ECMP method of the network device, they select the sending link independently from each other, and statistically, the load of each link will be relatively uniform. Second, without the present invention, when the ECMP method has hash collision, the defect that too many streams are sent through the same link will occur. The invention can not eliminate the Hash collision, but because the granularity of load balance is thinner, the influence of excessive packets passing through the same link is obviously much smaller than that of excessive packets passing through the same link, so the defect of Hash collision is also relieved.
In the case of turning on the network card offload, each time the Linux kernel network protocol stack is modified, it is a unique defined tuple of a large packet, typically close to 64 KB. After leaving the kernel protocol stack, the payload (payload) of the large packet is segmented by the network card, and adds an appropriate packet header and recalculates the check code to form a data packet with the size of the Maximum Transmission Unit (MTU) of the link, which is usually 1500 bytes. It can be seen that the unique definite tuples of a series of small packets obtained by the same big packet through the network card processing are all the same, the invention calls that the series of small packets form a sub-stream of the original network flow, the sub-stream is sent through the same link after being processed by the ECMP method of the network equipment, and the granularity of load balancing is the sub-stream at this time. When the network card is turned off to unload, the granularity of load balance is the data packet with the size of a single MTU, and the cost for turning on the network card to unload is the coarse granularity of load balance. When the network card is started to unload, if the number of small flows in the network is large, the operation effect of the method is not obvious, because the load balancing of the sub-flows is similar to the load balancing of the ECMP method on the original network flow. However, under the condition of a large number of small flows, the defect of the capability of the ECMP method for load balancing of network flows with different sizes is not obvious any more, so that the invention still achieves the self target. If the number of large flows in the network is large, the load balancing can work well by taking the sub-flows as the granularity, and because most of network flows can be divided into a plurality of sections of sub-flows and the sizes of the sub-flows are consistent, the load balancing of the sub-flows is realized. In summary, the load balancing with sub-streams as coarse granularity has little damage, and the method can effectively play a role in starting the network card to unload.
The processing of the data packet by the host receiver can be seen as the reverse of the sender. The unique determining tuple of each packet is restored, so that the upper layer protocol does not need to be modified, and the application processes of the local host and the remote host do not know that the data packet sent by the application processes of the local host and the remote host is modified once.
When the method is specifically implemented in a Linux kernel network protocol stack, the method can be realized in two modes of a kernel module or a source code patch. Since the processing logic of each host is consistent, the same kernel module or source patch may be distributed across all hosts of the data center. The invention is simple, efficient and consistent, and has the capability of actual deployment in a real data center.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention.

Claims (2)

1. A method for balancing network flow load of a data center is characterized in that: by utilizing an ECMP method widely deployed in network equipment, a host sender and a host receiver respectively process each data packet in the same network flow, so that the packets are uniformly dispersed on different equivalent links to realize load balancing with finer granularity; the processing method of the host sender is to generate a random number for each packet, modify the unique determined tuple according to the random number, and embed the random number into a certain field of the packet header; the processing method of the host receiver is to extract a random number from a field of a packet header and restore a unique determined tuple;
the processing method of the host sender specifically comprises the following steps:
A1) acquiring a data packet to be sent by using a Linux kernel network protocol stack;
A2) acquiring a source port number and a destination port number of a data packet;
A3) generating a random number;
A4) recalculating a new source port number and a new destination port number;
A5) writing back the generated random number, the new source port number and the new destination port number to the data packet;
the processing method of the host receiver specifically comprises the following steps:
B1) acquiring a data packet to be received by utilizing a Linux kernel network protocol stack;
B2) acquiring a generated random number, a new source port number and a new destination port number of a write-back data packet;
B3) calculating and restoring the original source port number and the original destination port number;
B4) writing the restored source port number and destination port number back to the data packet;
IN the step A1), a Linux kernel network protocol stack is used for acquiring a data packet to be sent, and a callback function of an NF _ INET _ LOCAL _ IN hook of a Netfilter framework, which is required by the data packet to be sent, is acquired; the obtained data packet is expressed in the form of a sk _ buff structure;
if the transport layer protocol is TCP in step a2), obtaining a pointer pointing to the TCP header in skbuff through a TCP _ hdr function, and obtaining a source port number and a destination port number of the packet by using a source field and a dest field respectively according to the definition of the TCP header tcphdr structure;
if the transport layer protocol is UDP, acquiring a pointer pointing to a UDP header in the skbuf through a UDP _ hdr function, and acquiring a source port number and a destination port number of the data packet by using a source field and a dest field respectively according to the definition of a UDP header UDP structure;
in the step a3), a true random number generating function get _ random _ bytes provided by a Linux kernel network protocol stack is used, and the number of bytes of the obtained random number occupying the storage space can be specified;
the new source port number and the new destination port number in the step a4) are calculated by taking the lower 6 bits bit of the random number, and performing xor operation with the lower 6 bits of the two port numbers respectively to obtain new two port numbers, wherein the two port numbers will change within the effective range [0,65535 ];
if the transport layer protocol in the step a5) is TCP, writing the new source port number and the new destination port number back to the source field and the dest field of the tcphdr structure; generating a random number and placing the random number in a differentiated service field of an IP (Internet protocol) header of a packet header;
if the transport layer protocol is UDP, writing the new source port number and the new destination port number back to the source field and the dest field of the UDP dr structure body; the generated random number is placed in the differentiated services field of the IP header of the packet header.
2. The method for data center network flow load balancing according to claim 1, wherein: the data packet to be received obtained in the step B1) is a callback function of an NF _ INET _ LOCAL _ OUT hook which automatically realizes a Netfilter framework in a Linux kernel network protocol stack;
if the transport layer protocol is TCP in step B2), obtaining a pointer pointing to the TCP header in skbuff through a TCP _ hdr function, and obtaining a new source port number and a new destination port number of the packet by using a source field and a dest field according to the definition of the TCP header tcphdr structure; acquiring a pointer pointing to an IP header in the skbuff through an IP _ hdr function, and acquiring a random number from a tos field of an IP header iphdr structure;
if the transport layer protocol is UDP, acquiring a pointer pointing to a UDP header in the skbuf through a UDP _ hdr function, and acquiring a source port number and a destination port number of the data packet by using a source field and a dest field respectively according to the definition of a UDP header UDP structure; acquiring a pointer pointing to an IP header in the skbuff through an IP _ hdr function, and acquiring a random number from a tos field of an IP header iphdr structure;
in the step B3), the original source port number and destination port number can be calculated and restored by performing xor on the lower 6 bits of the new source port number and the new destination port number and the random number again;
if the transport layer protocol in the step B4) is TCP, the source port number and the destination port number after being restored are written back to the source field and the dest field of the tcphdr structure;
if the transport layer protocol is UDP, the restored source port number and destination port number are written back to the source field and the dest field of the UDP dr structure.
CN201711372360.4A 2017-12-19 2017-12-19 Method for balancing network flow load of data center Active CN108337183B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711372360.4A CN108337183B (en) 2017-12-19 2017-12-19 Method for balancing network flow load of data center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711372360.4A CN108337183B (en) 2017-12-19 2017-12-19 Method for balancing network flow load of data center

Publications (2)

Publication Number Publication Date
CN108337183A CN108337183A (en) 2018-07-27
CN108337183B true CN108337183B (en) 2021-10-26

Family

ID=62923227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711372360.4A Active CN108337183B (en) 2017-12-19 2017-12-19 Method for balancing network flow load of data center

Country Status (1)

Country Link
CN (1) CN108337183B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113765919B (en) * 2021-09-07 2023-11-03 深圳市瑞云科技有限公司 Method for improving UDP message sending efficiency of Linux system
CN113890789B (en) * 2021-09-29 2023-03-21 华云数据控股集团有限公司 UDP tunnel traffic shunting method and traffic forwarding method suitable for data center

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102694721A (en) * 2011-03-21 2012-09-26 阿瓦雅公司 Usage of masked Ethernet addresses between transparent interconnect of lots of links (trill) routing bridges
US9571400B1 (en) * 2014-02-25 2017-02-14 Google Inc. Weighted load balancing in a multistage network using hierarchical ECMP

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102694721A (en) * 2011-03-21 2012-09-26 阿瓦雅公司 Usage of masked Ethernet addresses between transparent interconnect of lots of links (trill) routing bridges
US9571400B1 (en) * 2014-02-25 2017-02-14 Google Inc. Weighted load balancing in a multistage network using hierarchical ECMP

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于等价多路径的数据中心网络流量优化问题;安禄;《中国优秀硕士学位论文全文数据库 信息技术辑》;20150115;全文 *

Also Published As

Publication number Publication date
CN108337183A (en) 2018-07-27

Similar Documents

Publication Publication Date Title
US11601359B2 (en) Resilient network communication using selective multipath packet flow spraying
US9379982B1 (en) Adaptive stateless load balancing
US10868739B2 (en) Distributed deep packet inspection
US9736278B1 (en) Method and apparatus for connecting a gateway router to a set of scalable virtual IP network appliances in overlay networks
Shahbaz et al. Elmo: Source routed multicast for public clouds
US9929897B2 (en) Performing a protocol, such as micro bidirectional forwarding detection, on member links of an aggregated link that uses an address of the aggregated link
US10135736B1 (en) Dynamic trunk distribution on egress
CN113132249A (en) Load balancing method and equipment
US20160218960A1 (en) Multipath bandwidth usage
CN113676361A (en) On-demand probing for quality of experience metrics
WO2014194423A1 (en) Method and apparatus for providing software defined network flow distribution
US9917891B2 (en) Distributed in-order load spreading resilient to topology changes
JP2009093348A (en) Information processing apparatus and information processing system
CN108337183B (en) Method for balancing network flow load of data center
US11716306B1 (en) Systems and methods for improving packet forwarding throughput for encapsulated tunnels
US20230006937A1 (en) Packet flow identification with reduced decode operations
Fu et al. Performance comparison of congestion control strategies for multi-path TCP in the NORNET testbed
CN102857547B (en) The method and apparatus of distributed caching
US7660906B1 (en) Data delivery system and method
WO2023116580A1 (en) Path switching method and apparatus, network device, and network system
US10826831B2 (en) Dynamic protocol independent multicast load balancing
Shahbaz et al. Elmo: Source-routed multicast for cloud services
US11394663B1 (en) Selective packet processing including a run-to-completion packet processing data plane
US11165721B1 (en) Reprogramming multicast replication using real-time buffer feedback
CN112838983B (en) Data transmission method, system, device, proxy server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant