CN116158064A - Facilitating distributed SNAT services - Google Patents

Facilitating distributed SNAT services Download PDF

Info

Publication number
CN116158064A
CN116158064A CN202180061371.9A CN202180061371A CN116158064A CN 116158064 A CN116158064 A CN 116158064A CN 202180061371 A CN202180061371 A CN 202180061371A CN 116158064 A CN116158064 A CN 116158064A
Authority
CN
China
Prior art keywords
address
dsat
middlebox service
ipv6
destination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180061371.9A
Other languages
Chinese (zh)
Inventor
S·布特罗
M·坎彻尔拉
J·贾殷
A·森谷普塔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
VMware LLC
Original Assignee
VMware LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/931,196 external-priority patent/US11616755B2/en
Priority claimed from US16/931,207 external-priority patent/US11606294B2/en
Application filed by VMware LLC filed Critical VMware LLC
Publication of CN116158064A publication Critical patent/CN116158064A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/25Mapping addresses of the same type
    • H04L61/2503Translation of Internet protocol [IP] addresses
    • H04L61/2521Translation architectures other than single NAT servers
    • H04L61/2532Clique of NAT servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1023Server selection for load balancing based on a hash applied to IP addresses or costs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/25Mapping addresses of the same type
    • H04L61/2503Translation of Internet protocol [IP] addresses
    • H04L61/2517Translation of Internet protocol [IP] addresses using port numbers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/25Mapping addresses of the same type
    • H04L61/2503Translation of Internet protocol [IP] addresses
    • H04L61/2592Translation of Internet protocol [IP] addresses using tunnelling or encapsulation

Abstract

Some embodiments of the present invention provide novel methods for facilitating distributed SNAT (dsat) middlebox service operations of a first network at a gateway device between the first network and a second network and a host computer in the first network on which middlebox service operations are performed. The novel approach enables dsat of stateful middlebox services provided at multiple host computers, thus avoiding the bottleneck problems associated with stateful middlebox services provided at gateways, and also significantly reducing the need to redirect packets received at the wrong host by using the capabilities of off-the-shelf gateway devices to perform IPv6 encapsulation on IPv4 packets and assigning locally unique IPv6 addresses to each host executing the dsat middlebox service instance used by the gateway device.

Description

Facilitating distributed SNAT services
Background
Many networks rely on the use of source network address translation (source network address translation, snap) to translate addresses in an address space used within the network to globally unique addresses when communicating with external networks. Because the snap is a stateful service (stateful service), some networks provide the snap at a centralized location (e.g., a snap server). However, providing a centralized SNAT may lead to bottlenecks, as all traffic using the SNAT must traverse the centralized SNAT provider. To address this bottleneck problem, some networks use a distributed snap architecture. However, the distributed SNAT architecture has its own challenges. For example, because the SNAT operation performed by each instance of the distributed SNAT uses the same IP address, in some cases traffic traversing the provider gateway device will be forwarded to a randomly selected SNAT instance, which will result in, for a system having "N" distributed SNAT instances, that of N times (N-1) packets will be directed to distributed SNAT instances that do not store state information for the packet.
Similarly, stateful load balancing operations of a set of workload compute nodes executing on multiple host computers available on a shared virtual internet protocol (virtual internet protocol, VIP) also suffer from the same problems. Thus, for providing middlebox services such as SNAT and stateful load balancing, a solution is needed that solves both the bottleneck and the misdirection (misdirection) problem.
Disclosure of Invention
Some embodiments of the present invention provide novel methods for facilitating distributed middlebox service operations (e.g., distributed SNAT (dsat) or distributed load balancing (dLB) middlebox services) for a first network at a host computer in the first network on which the middlebox service operations are performed and a gateway device between the first network and a second network. These novel methods enable a distributed middlebox service (e.g., dsat or dLB) that provides stateful middlebox services at multiple host computers, thus avoiding the bottleneck problems associated with providing stateful middlebox services at gateways, and also significantly reducing the need to redirect packets received at the wrong host by using the capabilities of off-the-shelf gateway devices to perform IPv6 encapsulation on IPv4 packets and assign a locally unique IPv6 address to each host executing a distributed middlebox service instance.
The methods configure a gateway device to receive a packet destined for a distributed middlebox service and identify an IP version 6 (IPv 6) address based on a specified port and an IPv4 address in a destination internet protocol version 4 (IPv 4) header of the received packet for forwarding the received packet to a particular host computer on which a middlebox service instance associated with the destination IPv4 address is executing. In some cases, the destination IPv4 address is an IPv4 VIP address associated with a dslb middlebox service or an IPv4 address used by dsat as a source address for packets from the first network to the external network. The gateway device then encapsulates the packet with an IPv6 header using the identified IPv6 address and forwards the encapsulated packet based on the gateway device's IPv6 routing table.
The host computer is configured to receive the encapsulated packet from the gateway device that is destined for the identified IPv6 address and remove the encapsulation to provide the internal IPv4 packet to a middlebox service instance executing on the host computer based on the IPv4 address in the header of the internal IPv4 packet. The middlebox service instance performs an address replacement operation (e.g., a lookup in a connection tracker (connection tracker) that associates the translated IP address and port to the original IP address and port, or replaces the VIP destination address and port with a workload IP address and port) to replace the IP address and port number in the IPv4 header with the IPv4 address and port used by the source machine in the first network. In some embodiments, in the case of dsat, a middlebox service instance executing on the host computer is assigned a range of port numbers for performing middlebox service operations for packets destined for the external network from the first network, or in the case of dLB, a range of port numbers for receiving packets.
In some embodiments, the host computer is configured to advertise the availability of an IPv6 address prefix based on IPv4 of the middlebox service instance serving as a source address for packets from the first network to the external network and a range of port numbers assigned to the middlebox service instance. In some embodiments, the advertised IPv6 address prefix starts with a 16 bit (bit) (e.g., FC 00) that is not assigned as globally unique in IPv6, followed by an IPv4 address for middlebox service use, and then a set of bits in the 16-bit port address that are common to the range of port numbers assigned to middlebox service instances executing on the host computer (e.g., the first 6 bits are common to the range of 1024 port numbers assigned). In some embodiments, packets destined to the external network, processed by the middlebox service instance executing on the host computer, are sent from the host without being encapsulated in an IPv6 header using the advertised IPv6 address as the source IP address. Although the IPv6 address associated with the middlebox service instance is not used to encapsulate the packet, in some embodiments, other encapsulation may be used to reach the gateway device.
In some embodiments, the advertisement is made by a border gateway protocol (order gateway protocol, BGP) instance (e.g., a managed forwarding element) executing on the host computer. In some embodiments, the advertisement is made to a routing reflector (e.g., a routing server) that advertises the availability of the IPv6 address prefix at the host computer to other network elements including the gateway device or group of gateway devices. In some embodiments, the advertisement includes instructions to the gateway device to identify an IPv6 address based on a port number and an IPv4 address of the packet received at the gateway device and encapsulate the IPv4 packet with the identified IPv6 address. In other embodiments, the advertised IPv6 address is based on existing functionality of the hardware gateway device for handling IPv6 encapsulation of IPv4 packets.
In some embodiments, a cluster of controller computers (i.e., a set of one or more controller computers) of a first network provides configuration information to network elements to facilitate distributed middlebox service operation of the first network. In some embodiments, the configuration information includes an IPv6 routing table entry and a set of middlebox service records provided to the gateway device. The middlebox service record maps the combination of IPv4 and destination port numbers used by a particular middlebox service operation to an IPv6 destination address. In some embodiments, the middlebox service record is a policy-based routing (PBR) rule that defines an algorithm for generating an IPv6 address from an IPv4 destination address and a port number included in an IPv4 header, and specifies an IPv4 destination address to which the algorithm should be applied. In some embodiments, each IPv6 routing table entry identifies an IPv6 address prefix associated with a particular host computer of a set of multiple host computers that each execute a middlebox service instance and a next hop towards the particular host computer.
The controller computer cluster configures the middlebox service instance to use the particular IPv4 address in performing middlebox service operations and assigns each middlebox service instance of the particular middlebox service a non-overlapping range of port numbers for use. In some embodiments, the IPv4 address and port number range is provided to a host computer (e.g., MFE of the host computer) to identify an IPv6 address prefix corresponding to the range of IPv4 addresses and assigned port numbers and to advertise availability of the identified IPv6 address at the host computer. In some embodiments, the size of the range of port numbers assigned to middlebox service instances is fixed by an administrator based on the maximum number of middlebox service instances expected (e.g., for a maximum of 64 middlebox service instances expected, 64 different ranges of port numbers are created, each range including 1024 ports, each port number range being assigned to one middlebox service instance at startup). In other embodiments, the size of the port number range is dynamic and may vary based on the number of active middlebox service instances. The size of the port number range may also vary between middlebox service instances. For example, a larger range of port numbers is assigned to a first middlebox service instance executing on a host computer executing a greater number of workload machines than a second middlebox service instance executing on a host computer executing a lesser number of workload machines.
The foregoing summary is intended to serve as a brief introduction to some embodiments of the invention. This is not meant to be an introduction or overview of all-inventive subject matter disclosed in this document. The following detailed description and the accompanying drawings referred to in the detailed description will further describe the embodiments described in the summary of the invention and other embodiments. Accordingly, a full appreciation of the summary, detailed description, drawings, and claims is required in order to understand all embodiments described herein. Furthermore, the claimed subject matter is not limited by the disclosure, the detailed description, and the illustrative details in the drawings, but rather by the appended claims, as the claimed subject matter may be embodied in other specific forms without departing from the spirit of the subject matter.
Drawings
The novel features of the invention are set forth in the appended claims. However, for purposes of explanation, several embodiments of the invention are set forth in the following figures.
Fig. 1 illustrates an exemplary network in which a novel method for facilitating providing distributed SNAT middlebox service operations for a first network is implemented.
Figure 2 conceptually illustrates a process performed by a gateway device to process received packets destined for a particular middlebox service instance.
Fig. 3A illustrates packet processing at a gateway device as depicted in fig. 1 for a first packet in a particular packet stream received from an external network.
Fig. 3B illustrates packet processing at the gateway device as depicted in fig. 1 for subsequent packets in a particular packet stream for which the packet depicted in fig. 3A is the first packet received from the external network.
Fig. 4 illustrates packet processing at a gateway device as depicted in fig. 1 for a subsequent packet in a particular packet stream destined for an LB VIP for which packets were previously received from an external network.
Fig. 5 conceptually illustrates a process executing at a host computer for processing received IPv6 packets destined for a middlebox service instance executing on the host computer.
Fig. 6 illustrates a packet being sent from an external machine to a guest machine (guest machine) in an internal network.
Figure 7 conceptually illustrates a process performed by a NAT instance that processes a first packet in a packet flow destined for a destination machine in an external network.
Fig. 8 illustrates that the first packet in the packet flow is sent from the GM and is handled by the NAT instance executing on the same host computer.
Figure 9 conceptually illustrates a process for generating configuration data for different network elements that provide middlebox services and facilitate providing middlebox services.
FIG. 10 illustrates a cluster of computer controllers in a data center sending different types of configuration data to different network elements.
Figure 11 conceptually illustrates a process performed by a gateway device to facilitate providing middlebox services based on received configuration data.
Figure 12 conceptually illustrates a process for configuring a host computer to execute a distributed middlebox service instance and advertising an IPv6 address prefix associated with the middlebox service instance executing on the host computer.
Fig. 13 illustrates three different advertised exemplary IPv6 address prefixes that are used in different embodiments to advertise the availability of the service at the host computer, and corresponding exemplary destination IPv6 addresses that are generated by the gateway device for use in the IPv6 encapsulation header to forward the packet to a particular service instance executing on the host computer making the advertisement.
Figure 14 conceptually illustrates a set of data exchanges between network elements for migrating computing nodes.
FIG. 15 conceptually illustrates a computer system with which some embodiments of the invention are implemented.
Detailed Description
In the following detailed description of the present invention, numerous details, examples, and embodiments of the present invention are set forth and described. It will be apparent, however, to one skilled in the art that the invention is not limited to the illustrated embodiments and that the invention may be practiced without some of the specific details and examples that are discussed.
Some embodiments of the present invention provide novel methods for facilitating distributed middlebox service operations (e.g., distributed SNAT (dsat) or distributed load balancing (dLB) middlebox services) for a first network at a host computer in the first network on which the middlebox service operations are performed and at a gateway device between the first network and a second network. These novel methods enable a distributed middlebox service (e.g., dsat or dLB) that provides a stateful middlebox service at multiple host computers, thus avoiding the bottleneck problems associated with providing a stateful middlebox service at a gateway, and also significantly reducing the need to redirect packets received at the wrong host by using the capabilities of off-the-shelf gateway devices to perform IPv6 encapsulation on IPv4 packets and assign a locally unique IPv6 address to each host executing a distributed middlebox service instance. The following discussion focuses in part on dsat and dLB middlebox services, however, one of ordinary skill in the art will appreciate that the methods discussed may be applied to any distributed, stateful middlebox service that uses the same IP address at multiple hosts.
The method configures the gateway device to receive packets destined for an IPv4 VIP of a dsat service or an IPv4 address of a dsat service used as a source address for packets destined to an external network from the first network, and identify an IP version 6 (IPv 6) address based on a port specified in an IPv4 header of the received packet and the IPv4 destination address, the IPv6 address for forwarding the received packet to a host computer on which the dsat operation is performed. The gateway device then encapsulates the packet with an IPv6 header using the identified IPv6 address and forwards the encapsulated packet based on the gateway device's IPv6 routing table.
As used in this document, a packet (packet) refers to a set of bits of a particular format that are sent across a network. In some embodiments, the packet is referred to as a data message. Those of ordinary skill in the art will recognize that the term "packet" is used in this document to refer to a collection of various formatted bits transmitted across a network. Formatting of these bits may be specified by standardized protocols or non-standardized protocols. Examples of packets following standardized protocols include ethernet frames, IP packets, TCP segments, UDP datagrams, and the like. Furthermore, as used in this document, references to the L2, L3, L4, and L7 layers (or layer 2, layer 3, layer 4, and layer 7) are references to the second data link layer, third network layer, fourth transport layer, and seventh application layer, respectively, of the OSI (open systems interconnection) layer model.
Further, in this example, the term "Managed Forwarding Element (MFE)" refers to a software forwarding element or a hardware forwarding element configured by a cluster of controller computers (i.e., a set of one or more controller computers providing configuration data to network elements) to implement a logical network comprising a set of logical forwarding elements (logical forwarding element, LFE). In some embodiments, each LFE is a distributed forwarding element implemented by configuring multiple MFEs on multiple host computers. To this end, in some embodiments, each MFE or module associated with the MFE is configured to encapsulate data messages of the LFE using an overlay network header (overlay network header) containing a Virtual Network Identifier (VNI) associated with the overlay network. As such, in the following discussion, LFE is referred to as an overlay network fabric (overlay network construct) across multiple host computers.
In some embodiments, the LFE also spans configured hardware forwarding elements (e.g., a shelf-top switch). In some embodiments, the set of LFEs includes a logical switch implemented by configuring a plurality of software switches or related modules on a plurality of host computers. In other embodiments, the LFE may be other types of forwarding elements (e.g., logical routers), or any combination of forwarding elements forming a logical network or portion thereof (e.g., logical switches and/or logical routers). Many examples of LFEs, logical switches, logical routers, and logical networks currently exist, including those provided by NSX networks and service virtualization platforms of VMware.
Fig. 1 illustrates an exemplary network 100 in which novel methodologies for facilitating providing distributed SNAT operation for a first network are implemented. Fig. 1 illustrates a network 100 including an external network 101, the external network 101 connecting a set of machines 102 external to a data center 105 with Guest Machines (GM) 126 executing on a plurality of host computers 120 in the data center 105 (e.g., a public cloud data center or data center set). Gateway device 110 executes between an internal network and external network 101 and is configured to facilitate providing distributed middlebox services for communication between external machine 102 and an internal computing node (e.g., GM 126) as discussed below with respect to fig. 11. Gateway device 110 includes a cache 111 and a pre-routing rule set 112, where cache 111 stores information about forwarding decisions made for previously received packets, and in some embodiments the pre-routing rule set includes policy-based routing rules based on IP (IPv 4 or IPv 6) addresses. After performing the lookup in cache 111 and the lookup in pre-routing rule set 112, the packet may be routed using an IPv4 routing table or an IPv6 routing table, or may be sent to an IPv6 encapsulator to be encapsulated in an IPv6 header and then routed based on the IPv6 routing table.
The system 100 also includes a set of controller computers 140, the set of controller computers 140 providing configuration information to the set of host computers 120 to implement a set of logical forwarding elements (e.g., using MFE 121), an IPv6 processing module 122 that processes received IPv6 packets, a Routing Machine (RM) 123 that acts as a BGP instance to interact with a set of routing servers 130, a distributed middlebox service instance 124 (e.g., a SNAT instance 124 that provides dsat middlebox services in the illustrated embodiment), and a set of computing nodes (e.g., GM 126). Host computer 120 also includes a set of NAT records 125, which in some embodiments are caches that record associations between internal IP address/port pairs and ports selected by snap instance 124 as external ports for each IP address/port pair. In some embodiments, these cache records also store the IP address of the external machine along with the port selected by the snap instance 124 to allow the snap instance 124 to use the same port for multiple data message flows established with different external machines using different IP addresses.
The system 100 also includes a set of routing servers 130 (also referred to as routing reflectors), each of which receives routing information from a plurality of network elements and provides the routing information to other network elements to simplify the exchange of routing information. For example, instead of using a full mesh (full mesh) that connects each BGP instance (e.g., RM 123), each BGP instance interacts with routing server 130 to reduce the number of advertisements made by each BGP instance. The data center 105 also includes an intermediate architecture (intervening fabric) 150 that provides physical connections between the illustrated network elements.
The function of the various elements of fig. 1 will be discussed in more detail with respect to fig. 2-4. Fig. 2 conceptually illustrates a process 200 performed by a gateway device (e.g., gateway device 110) to process received packets destined for a particular middlebox service instance (e.g., snap instance 124 a). In some embodiments, the gateway device is an off-the-shelf gateway device having the capability to encapsulate an IPv4 packet in an IPv6 packet and generate an IPv6 header for the IPv6 packet using a set of rules or records that specify the IPv6 header to use based on the IPv4 header value. In other embodiments, the gateway device is a fully programmable forwarding element programmed to perform the operations of process 200. Those of ordinary skill in the art will appreciate that process 200 may be performed for packets destined for an IPv4 address associated with any of a dsat middlebox service, a dLB middlebox service of a set of load balanced workload compute node groups, or any distributed middlebox service using the same IP address at multiple middlebox service instances executing on multiple host computers.
Process 200 begins by receiving (at 210) an IPv4 packet, the IPv4 packet being destined for a particular middlebox service instance (e.g., snap instance 124 a) executing on a particular host computer (e.g., host computer 120 a), the IPv4 packet having a destination address associated with the middlebox service (i.e., all middlebox service instances). The IPv4 header identifies the source machine in the external network by a source IP address and identifies the destination IP address and port that are used by the SNAT instance that handles packets of the communication session (also referred to as a packet flow or connection) between the internal computing node served by the middlebox service instance and the external machine.
The process then determines (at 220) that the IPv4 packet needs to be encapsulated in an IPv6 packet. In some embodiments, the determination of the first packet in the particular packet stream received from the external machine is made based on the middlebox service record. In some embodiments, the middlebox service record is an IPv4 routing table entry for an IPv4 destination address, the IPv4 routing table entry indicating that the packet needs to be encapsulated in an IPv6 packet. In some embodiments, the IPv4 routing entry also indicates (1) a specific IPv6 destination address for encapsulating the IPv4 packet based on an IPv4 header value (e.g., an IPv4 destination address and a destination port (for SNAT) or source port (for LB)), or (2) an algorithm for generating the IPv6 destination address for use in an IPv6 encapsulation header. In some embodiments, the middlebox service record is a policy-based routing (PBR) rule that specifies the encapsulation of all packets destined for the IPv4 address associated with the middlebox service. As with the IPv4 routing entry, the PBR rules may specify an IPv6 destination address or a method for generating an IPv6 destination address. In some embodiments, the PBR rules are included in the pre-routing rules 112. In some embodiments, after a received first packet of a particular packet stream from an external machine, a determination of a subsequent packet of the particular packet stream is based on a cache record that includes a determination made for the first packet of the particular stream and is stored in cache 111.
After determining (at 220) that the IPv4 packet needs to be encapsulated in the IPv6 packet, process 200 identifies (at 230) an IPv6 destination address to be used in the IPv6 encapsulation header based on the IPv4 header value (e.g., destination IPv4 address and destination port (for dSNAT) or source port (for dLB)). In some embodiments, this identification is based on the IPv6 destination address specified for the IPv4 destination address and port pair in the middlebox service record. In other embodiments, this identification is based on an algorithm or other programming method that generates an IPv6 destination address from an IPv4 header. In some embodiments, the algorithm employs an IPv6 prefix for a locally unique address (e.g., FC00:: 8) and appends an IPv4 destination address and destination (or source) port, followed by zero (zero is used for simplicity, but in some embodiments any trailing group of bits can work).
After identifying (at 230) the IPv6 destination address, the IPv4 packet is encapsulated (at 240) with an IPv6 header that uses the identified IPv6 address as the destination address. In some embodiments, the encapsulation is accomplished by an IPv6 encapsulation module (e.g., IPv6 encapsulator 114). In some embodiments, the destination port is an IPv4 destination port, but in other embodiments it may be any port. In some embodiments, the source IPv6 address and port are IPv6 addresses and ports associated with the gateway device. Process 200 then performs (at 250) a lookup in an IPv6 routing table (e.g., IPv6 routing table 115) to identify a next hop for the encapsulated data message. As will be discussed in more detail below, for each IPv6 address prefix in a set of IPv6 address prefixes associated with a set of host computers executing the middlebox service instance, the gateway device is configured with a set of IPv6 routes (i.e., routing table entries) that indicate a next hop for each IPv6 address prefix. In some embodiments, the IPv6 address prefix is provided by the controller computer (e.g., as a static route). In other embodiments, a host computer (e.g., BGP instance executing on the host computer) advertises an IPv6 address prefix as being available at the host computer. As will be discussed in more detail below, each middlebox service instance is assigned a range of port numbers that may be used to generate a unique IPv6 address prefix based on the algorithm described above for generating an IPv6 destination address.
Finally, process 200 forwards (at 260) the IPv6 packet to the identified next hop towards the middlebox service instance, and the process ends. In some embodiments, the packet is forwarded through an intermediate network fabric (e.g., intermediate fabric 150). In some embodiments, the forwarding elements in the intermediate architecture learn the next hop to get the destination IPv6 address using standard protocols (such as any or all of BGP, IGP, ARP) or any other route learning mechanism known to those of ordinary skill in the art.
Fig. 3A, 3B, and 4 illustrate packet processing examples for a first packet and subsequent packets for different distributed middlebox services in a particular flow. Fig. 3A illustrates packet processing at gateway device 110 as depicted in fig. 1 for a first packet 320a in a particular packet stream received from external network 101. As shown, packet 320a is received from machine 102 in external network 101, destined for NAT IP address and port N (i.e., the port selected by the snap as the source port). The packet is first compared to the cache entry in cache 111 (which does not yet include cache record 111 a). Since no cache record is found, a lookup is performed in the pre-routing rule set 112 and in the illustrated embodiment, the pre-routing rule 112a is identified as being applied to the packet based on the destination IP (i.e., NAT IP). Based on identifying that pre-routing rule 112a applies to the packet, a cache record 111a is created in cache 111. In the illustrated embodiment, the cache record 111a identifies the result of the lookup in the pre-routing rule, but one of ordinary skill in the art will appreciate that in other embodiments, the cache record identifies a particular IPv6 address or IPv6 encapsulation address and next hop for encapsulating the packet in the flow based on the rule.
According to pre-routing rules 112a identified based on the destination IPv4 address (i.e., the IPv4 address (NAT IP) associated with the dsat service), the packet is passed to an IPv6 packet encapsulator 114 for encapsulation, rather than to an IPv4 routing table 113 for forwarding based on the IPv4 header value. In the illustrated embodiment, pre-routing rules 112a identify to IPv6 encapsulator 114 encapsulation rules stored by the IPv6 encapsulator for performing the encapsulation as part of delivering the packet. In other embodiments, passing the packet to IPv6 encapsulator 114 includes sending rules (e.g., algorithms) for generating an encapsulated header value. IPv6 encapsulator 114 then encapsulates the packet with an IPv6 header packet based on the identified rule (i.e., rule 114 a).
In the illustrated embodiment, the destination IPv6 address is generated by using an FC00 prefix for the locally unique address, followed by a destination IP (i.e., NAT IP), then a destination port (port N), then 0. Other prefixes or padding bits are used in other embodiments, and one of ordinary skill in the art will recognize that this is just one of many possible algorithms for generating an IPv6 destination address that have been chosen for simplicity. In some embodiments, the IPv6 destination port, source IPv6 address, and IPv6 source port are the original destination port, the source IPv6 address associated with gateway device 110, and the randomly selected IPv6 source port. However, those of ordinary skill in the art will appreciate that other destination ports and source IPv6 addresses and ports are used in other embodiments without affecting the routing of packets to particular hosts and dsat instances.
After the packet is encapsulated in an IPv6 header that includes the generated IPv6 address, it is passed to an IPv6 forwarding operation represented by IPv6 routing table 115. A lookup in IPv6 routing table 115 identifies a matching routing table entry 115a, which matching routing table entry 115a includes a next hop (i.e., next hop 3) interface for forwarding the packet to its destination. In some embodiments, the IPv6 routing table entries for NAT IP (and LB IP) are dynamic routes learned based on advertisements from BGP instances on the host computer, while in other embodiments, the routing table entries are static routes received from controller computers in a set of controller computers (e.g., controller computer cluster 140). The IPv6 packet 330a is then forwarded to the destination host based on the IPv6 destination address (i.e., FC00: NAT IP: portN:).
Fig. 3B illustrates packet processing at gateway device 110 as depicted in fig. 1 for a subsequent packet 320B in a particular packet stream for which packet 320a is the first packet received from external network 101. As shown, packet 320b is received from machine 102 in external network 101, destined for NAT IP address and port N (i.e., the port selected by the snap as the source port). The packet is first compared to the cache entry in the cache 111 (the cache 111 contains the cache record 111a based on the first packet 320a received from the external network 101). Cache record 111a indicates that packet 320b should be encapsulated in an IPv6 header according to a particular encapsulation rule (i.e., "IPv6 encapsulation rule 1"). In some embodiments, the packet and rule identifier are passed to IPv6 encapsulator 114 bypassing pre-routing rules 112. However, one of ordinary skill in the art will appreciate that in other embodiments, the cache record identifies a particular IPv6 address or IPv6 encapsulation address and next hop for encapsulating the packet in the flow based on the rule. IPv6 encapsulator 114 then encapsulates the packet with an IPv6 header packet based on the identified rule (i.e., rule 114 a).
In the illustrated embodiment, the destination IPv6 address is generated by using an FC00 prefix for the locally unique address, followed by a destination IP (i.e., NAT IP), then a destination port (port N), then 0. Other prefixes or padding bits are used in other embodiments, and one of ordinary skill in the art will recognize that this is just one of many possible algorithms for generating an IPv6 destination address that have been chosen for simplicity. In some embodiments, the IPv6 destination port, source IPv6 address, and IPv6 source port are the original destination port, the source IPv6 address associated with gateway device 110, and the IPv6 source port randomly selected for the particular flow. However, those of ordinary skill in the art will appreciate that other destination ports and source IPv6 addresses and ports are used in other embodiments without affecting the routing of packets to particular hosts and dsat instances.
After the packet is encapsulated in an IPv6 header that includes the generated IPv6 address, it is passed to an IPv6 forwarding operation represented by IPv6 routing table 115. A lookup in IPv6 routing table 115 identifies a matching routing table entry 115a, which matching routing table entry 115a includes a next hop (i.e., next hop 3) interface for forwarding the packet to its destination. In some embodiments, if cache record 111a identifies a next hop interface to use after encapsulation, then a lookup in IPv6 routing table 115 is not performed. In some embodiments, the IPv6 routing table entries for NAT IP (and LB IP) are dynamic routes learned based on advertisements from BGP instances on the host machine, while in other embodiments, the routing table entries are static routes received from controller computers in a set of controller computers (e.g., controller computer cluster 140). The IPv6 packet 330b is then forwarded to the destination host based on the IPv6 destination address (i.e., FC00: NAT IP: portN:).
Fig. 4 illustrates packet processing at gateway device 110 as depicted in fig. 1 for a subsequent packet 420 in a particular packet stream for which packets were previously received from external network 101 that is destined for an LB VIP. As shown, packet 420 is received from machine 102 in external network 101, destined for LB VIP address and port B (i.e., the port associated with VIP). The packet is first compared with the cache entry in the cache 111 (the cache 111 contains the cache record 111b based on the first packet 320a received from the external network 101 for a particular packet stream destined for the LB VIP). Cache record 111b indicates that packet 420 should be encapsulated in an IPv6 header according to a particular encapsulation rule (i.e., "IPv6 encapsulation rule 2"). In some embodiments, the packet and rule identifier are passed to IPv6 encapsulator 114 bypassing pre-routing rules 112. However, one of ordinary skill in the art will appreciate that in other embodiments, the cache record identifies a particular IPv6 address or IPv6 encapsulation address and next hop for encapsulating the packet in the flow based on the rule. IPv6 encapsulator 114 then encapsulates the packet with an IPv6 header packet based on the identified rule (i.e., rule 114 b).
In the illustrated embodiment, the destination IPv6 address is generated by using the FC00 prefix for the locally unique address, followed by the destination IP (i.e., NAT IP), then the source port (port Z), then 0. In some embodiments, a source port is used instead of a destination port because the source port is randomly selected from possible port values when a flow (e.g., session or connection) is initiated and then is constant over the life cycle of the packet. This allows individual streams to be distributed among multiple LB instances based on different source port ranges allocated to each LB instance while ensuring that packets for a particular stream are received by the same LB instance that maintains state information for providing LB services to that stream. On the other hand, if a destination port is used, then in some cases the LB VIP may be associated with multiple servers listening to a particular port or ports, such that the destination port will not be able to identify a particular LB instance. In other embodiments (where the destination port may be used to distinguish between different hosts), the rules will be the same as those for NAT IP in fig. 3. Other prefixes or padding bits are used in other embodiments, and one of ordinary skill in the art will recognize that this is just one of many possible algorithms for generating an IPv6 destination address that have been chosen for simplicity. In some embodiments, the IPv6 destination port, source IPv6 address, and IPv6 source port are the original destination port, the source IPv6 address associated with gateway device 110, and the IPv6 source port randomly selected for the particular flow. However, those of ordinary skill in the art will appreciate that other destination and source IPv6 addresses and ports are used in other embodiments without affecting the routing of packets to particular hosts and dLB instances.
In some embodiments, even some stateful distributed services are advertised as being available at each host computer using the same network address. For example, in some embodiments, a stateful distributed load balancing service for distributing requests received from clients in an external network relies on a set of gateways of an Available Zone (AZ) to consistently send the same stream to the same host computer providing distributed load balancing, based on equal cost multi-path (equal cost multipathing, ECMP) operations performed at gateway devices of the Available Zone (AZ). To achieve this ECMP operation, in some embodiments, the routing machine on each host computer executing the distributed load balancer instance advertises the same VIP address as available, and the AZ's gateway device records the plurality of advertised next hop addresses as being associated with that VIP as possible next hops. For received data messages addressed to the VIP, the AZ gateway device selects a particular next hop using ECMP operations. In such embodiments, when the number of host computers providing distributed load balancing services changes, an acceptable number of redirection operations may be required, making it worthless to ensure that different host computers can be identified deterministically for each flow (or data message).
After the packet is encapsulated in an IPv6 header that includes the generated IPv6 address, it is passed to an IPv6 forwarding operation represented by IPv6 routing table 115. The lookup in IPv6 routing table 115 identifies the next hop (i.e., next hop 3) interface for forwarding the packet to its destination. In some embodiments, if cache record 111b identifies a next hop interface to be used after encapsulation, then a lookup in IPv6 routing table 115 is not performed. In some embodiments, the IPv6 routing table entries for LB IP (and NAT IP) are dynamic routes learned based on advertisements from BGP instances on the host machine, while in other embodiments, the routing table entries are static routes received from the controller computers of a set of controller computers 140. The IPv6 packet 430 is then forwarded to the destination host based on the IPv6 destination address (i.e., FC00: NAT IP: portZ:).
After the gateway device has encapsulated and forwarded the packet, the packet will reach the host computer on which the destination distributed middlebox instance is executing. The host computer is configured to receive the encapsulated packet from the gateway device that is destined for the identified IPv6 address and remove the encapsulation to provide the internal IPv4 packet to a middlebox service instance executing on the host computer based on the IPv4 address in the header of the internal IPv4 packet. Figure 5 conceptually illustrates a process 500 executing at a host computer, the process 500 for processing received IPv6 packets destined for a middlebox service instance executing on the host computer. In some embodiments, process 500 is performed by a host computer on which a middlebox service instance executes. In some embodiments, the host computer performs process 500 using an MFE (e.g., MFE 121 a), a software routing element (e.g., routing machine 123 a), an IPv6 processing module (e.g., IPv6 processing module 122 a), a distributed middlebox instance (e.g., snap 124 a), and a storage device (e.g., NAT record 125 a) that stores records for IPv4 translation operations executing on the host computer. In other embodiments, some elements are combined (e.g., MFE implements (or is) a software routing element and contains IPv6 packet processing instructions, while a distributed middlebox service instance includes a record for executing middlebox services). The process 500 begins by receiving (at 510) an IPv6 packet that is destined for an IPv6 address associated with a middlebox service instance executing on a host computer.
Process 500 then removes (at 520) the IPv6 encapsulation header and performs a lookup in the routing table to identify the next hop for the internal IPv4 packet. In some embodiments, the received packet is passed to an IPv6 processing module (e.g., 122) programmed with IPv6 routing rules and tables. In some embodiments, these routing rules include rules that identify IPv6 packets destined for the middlebox service instance as requiring removal of the IPv6 encapsulation header and delivery to an IPv4 routing table lookup (e.g., performed by the MFE or routing module). In some embodiments, the rule identifies the next hop and the IPv6 encapsulation should be removed before forwarding the packet.
In other embodiments, after removing the IPv6 encapsulation, a lookup is performed in the IPv4 routing table to identify a next hop towards the middlebox service instance. In some embodiments, the IPv6 processing module is an IPv6 stack of a dual stack routing element (MFE or routing module) rather than a stand-alone module. In some embodiments, the lookup is performed in a first virtual routing and forwarding (virtual routing and forwarding, VRF) context of a first logical network or tenant associated with the middlebox service instance. In some embodiments, a virtual tunnel endpoint (virtual tunnel end point, VTEP) receives an encapsulated packet based on an IPv6 address of the encapsulated packet. The VTEP is then responsible for decapsulating the packet and identifying the next hop based on the IPv4 address and virtual network identifier associated with the received IPv6 packet. The packet is then forwarded (at 530) with the IPv4 header towards the middlebox service instance using the IPv4 header value in the packet received at the gateway device and through the identified next-hop interface.
The distributed middlebox service instance receives the packet and identifies (at 540) a destination internal IPv4 header value to forward the data to the correct destination. In some embodiments, identifying the destination IPv4 inner header value includes performing a lookup in a middlebox service record store (e.g., a cache) that maintains an association between a port used to replace a source port of the outgoing packet and a source IPv4 address and port. In some embodiments, the port used to replace the source port of the outgoing packet belongs to a range of port numbers assigned to middlebox service instances executing on the host computer (e.g., the source port used to have the middlebox service instance replace the source port of the outgoing packet or to be directed to the middlebox service instance). In some embodiments, the destination port is used to perform a lookup (e.g., a query) in the middlebox service record storage to identify an internal IPv4 address and port to replace the current (external) destination IPv4 address and port for forwarding the packet to the correct destination machine (e.g., virtual machine, container, pod, etc.).
For the distributed LB instance, for the first packet of a particular flow, identifying the destination internal IPv4 address includes performing a load balancing operation to select the destination machine (either on the same host computer or on a different host computer). In some embodiments, the load balancing operation preferentially selects local compute nodes to provide services associated with the VIP to reduce redirection, but based on the load balancing algorithm or method, the load balancing operation may also select any compute node executing on any host computer and select at least one compute node on at least one other host computer for at least one flow. In some embodiments, after the destination machine is selected, the distributed LB instance creates a record in the middlebox service record store to identify the destination IPv4 header value for the subsequent packets of the particular flow. For subsequent packets in the flow destined for the LB VIP, the lookup in the middlebox service record store is based on a set of at least one other IPv4 header value (e.g., source IP, source port, source IP/port, etc.).
After the internal IPv4 address and port have been identified, the distributed middlebox service replaces (at 550) the external IPv4 address and port with the identified internal IPv4 address and port. The packet is then forwarded (at 560) to the destination machine based on the internal IPv4 address and port. In some embodiments, the packet is forwarded (at 560) through a logical switch that connects the destination machine to the distributed middlebox service instance. In some embodiments, for an LB instance, the identified IPv4 destination connected to the logical switch is a compute node executing on another host computer that also implements the logical switch (i.e., within the span of the logical switch). After the packet is forwarded to the destination machine, the process ends.
Fig. 6 illustrates a packet sent from the external machine 102 to a Guest Machine (GM) 126 in the internal network. The original data message is sent with an IP header 650a that specifies a Destination IP (DIP) address associated with the dsat middlebox service and a destination port (DPort) within the range allocated to the particular dsat middlebox service instance 124 a. A source IP address (SIP, ext IP 1) and a Port (Port Y) are associated with the external machine 102. At gateway device 110, the IPv4 packet (i.e., inner packet 670) sent by external machine 102 is encapsulated with IPv6 header 660 based on the process discussed above with respect to fig. 2-3B. The resulting packet has a destination IPv6 address (e.g., FC00: NAT IP: portN::) associated with host computer 120a and NAT instance 124a and is sent to host computer 120 through intermediate architecture 150.
The IPv6 encapsulated packet is then received at a Managed Forwarding Element (MFE) 121a and passed to an IPv6 processing module 122a for IPv6 processing. In other embodiments, the IPv6 processing is performed in the IPv6 stack in the routing machine 123a or dual stack MFE. In some embodiments, IPv6 processing includes removing IPv6 encapsulation and returning IPv4 packets to MFE 121a. In other embodiments, the IPv6 processing includes identifying NAT instance 124a based on the IPv6 header value and removing the encapsulation. In embodiments where the internal packet 670 is returned to the MFE 121a, the MFE 121a identifies the NAT instance 124a as the destination of the internal packet 670 based on the IPv4 header.
The NAT 124a instance receives the original (or internal) IPv4 packet and performs a lookup in NAT record store 127 a. NAT record 127a includes record 680 associating the port number for the replacement source port to the replaced source port and the replaced source IPv4 address. In some embodiments, the port number used to replace the source port is also associated with the IP address of the external machine, such that the same port number may be reused (reuse) for connections to different external machines. In some embodiments, the lookup is based on the destination port of the IPv4 packet. The result of the lookup in NAT record 127a is then provided to NAT instance 124a for replacement of the destination IPv4 address and port number. The IPv4 packet with the translated address (i.e., with IPv4 header 650 b) is then forwarded to the destination machine (e.g., GM 126 a).
Fig. 7 conceptually illustrates a process 700 performed by a NAT instance (e.g., NAT instance 124 a), the process 700 processing a first packet in a packet flow destined for a destination machine 102 in an external network 101. The discussion of fig. 7 will refer to the elements of fig. 8 to provide an example of the operation of fig. 7. Fig. 8 illustrates a first packet in a packet flow being sent from a GM and processed by a NAT instance executing on the same host computer. Process 700 begins by receiving (at 710) a packet destined for a machine in an external network. The received packet has a source IPv4 address associated with a source machine in the internal network and a source port selected by the source machine. For example, packet 831 received from GM 126a at NAT instance 124a is destined for an external IP (i.e., ext IP 1) and a destination Port (i.e., port X) and has the source IP address (i.e., GM IP) and source Port (i.e., port Y) of GM 126 a. In some embodiments, the source port of packet 831 is randomly selected from the entire source port range (0-65535).
The process selects (at 720) a source port number (e.g., port N) from a range of available port numbers assigned to the NAT instance. In some embodiments, these available port numbers are port numbers in the assigned range of port numbers that have not been selected for the currently active connection. In some embodiments, the port number ranges are assigned by the controller computer cluster. In some embodiments, the port number range is a port number range that shares a first set of common bits that are not shared by port numbers assigned to other NAT instances on other host computers. For example, each port number in the range of 1024 port numbers of 0-1023 or 2048-3071 each shares a different set of common 6 initial bits in the 16-bit port address. The allocated larger or smaller ranges will have fewer or more common bits, respectively.
After selecting (at 720) the port number, process 700 creates (at 730) a connection trace record (e.g., NAT record) in a connection tracker (e.g., NAT record 127) that associates the selected port number with the source IP address and source port of the packet for which the port number was selected. In fig. 8, NAT record 841 associates the selected Port (Port N) with the source IP address (GM IP) and source Port (Port Y) of received packet 831. In some embodiments, the port number used to replace the source port is also associated with the IP address of the external machine, such that the same port number can be reused for connections to different external machines. As described above, the connection trace record is used to translate the NAT IP and port number of a packet received from the external network back to the internal IP and port number of the original connection associated with the destination port number of the received packet.
After creating (at 730) the connection trace record, process 700 replaces (at 740) the source IP address and source port number with the particular IP address and selected port number in the set of IP addresses assigned to the distributed NAT service. For simplicity, the example throughout this specification assumes that the distributed NAT service is assigned a single external IP address, i.e., NAT IP. Those of ordinary skill in the art will appreciate that the methods discussed apply similarly to a plurality of IP addresses associated with a distributed NAT service (or any other distributed middlebox service that uses a set of external IP addresses). For example, the source IP address and port of packet 831 is replaced with NAT IP and selected port number to produce a serviced packet 832, which is then forwarded (at 750) to the destination and process 700 ends. In some embodiments, forwarding the serviced packet to the destination includes forwarding the serviced packet to an MFE (e.g., MFE 121 a) for forwarding to an external destination.
In some embodiments, packets destined for the external network, processed by the middlebox service instance executing on the host computer, are sent from the host computer without encapsulation in an IPv6 header using the advertised IPv6 address as the source IP address. Although the IPv6 address prefix associated with the middlebox service instance is not used to encapsulate the packet, in some embodiments, other encapsulation may be used to reach the gateway device or external destination machine. In some embodiments, packets processed by a middlebox service instance are encapsulated in an IPv6 header using an IPv6 source address associated with the middlebox service instance. Fig. 8 illustrates that the serviced packet 832 is encapsulated in an overlay encapsulation header 870 for transmission over the overlay network to which the GM 126a and NAT instance 124a belong (e.g., using an overlay network identifier such as a Virtual Network Identifier (VNI)). In some embodiments, the MFE or Virtual Tunnel Endpoint (VTEP) performs encapsulation of the serviced packet 832 to produce an encapsulated packet 833.
In some embodiments, a cluster of controller computers (i.e., a set of one or more controller computers) of a first network provides configuration information to network elements to facilitate middlebox service operation of the first network. Figure 9 conceptually illustrates a process 900 for generating configuration data for different network elements that provide middlebox services or facilitate providing middlebox services. In some embodiments, process 900 is performed by a controller computer or cluster of controller computers. In some embodiments, process 900 is performed each time a new middlebox service instance is started (spin up).
Process 900 begins by identifying (at 910) a middlebox service instance in a first network. In some embodiments, identifying the middlebox service instance includes identifying: (1) Active middlebox service instances and (2) middlebox service instances that have been requested to be activated (e.g., turned on). In some embodiments, identifying middlebox service instances includes identifying a number of end machines (e.g., workload VMs, containers, etc.) that each middlebox service instance supports (i.e., provides middlebox services to) for. In some embodiments, in addition to identifying the number of end machines, identifying the middlebox service instance further includes identifying either or both of: (1) The total number of connections being handled by the distributed middlebox service (i.e., the sum of all distributed middlebox instances) and (2) the number of connections being handled by each middlebox service instance.
After identifying (at 910) the middlebox service instance, the process 900 determines (at 920) the number of port ranges or the size of the port ranges that will be available for allocation. In some embodiments, the number of port ranges or the size of the port ranges is determined based on input from the first network or a user (e.g., administrator) of the logical network within the first network. The input from the user may be based on a maximum amount of resources the user desires the middlebox service instance to consume in providing the middlebox service. In some embodiments, the user input specifies any or all of the following: (1) a maximum number of middlebox service instances that can be instantiated, (2) a maximum number of ports that can be allocated to a single middlebox service instance, or (3) a policy for determining the number of ports allocated to a particular middlebox service instance. In some embodiments, these policies are based on any or all of the following: (1) a number of active middlebox service instances, (2) a number of compute nodes for which each active middlebox service instance provides a distributed middlebox service, (3) a number of connections being handled by the distributed middlebox service, and (4) a number of connections being handled by each middlebox service instance.
For example, a policy may specify that the entire range of possible port numbers be divided into 2 power of two (or equally large) portions, the 2 power being at least twice the number of middlebox service instances, and as the number of middlebox service instances increases or decreases, the port number range is adjusted based on the policy (e.g., from 4 to 5 middlebox service instances result in each of the 8 port ranges being divided into two smaller port number ranges, or from 17 to 16 middlebox instances result in the 64 port number ranges merging into 32 port number ranges). In some embodiments, the policy specifies that each middlebox service instance is assigned a non-contiguous range of port numbers (e.g., 0-8191 assigned to a first middlebox service instance, 16384-2475 assigned to a second middlebox service instance, etc.). Such a policy allows for increasing and decreasing the number of host computers without having to reassign port number ranges frequently.
In some embodiments, the policy may specify: when (1) the fraction of port numbers assigned to a particular middlebox service instance used by that middlebox service instance is higher than a threshold fraction (e.g., 0.8 or 0.9), the adjacent available range will be assigned, the workload compute node will migrate from the host computer on which the middlebox service instance is executing, or a new middlebox service instance will be on (e.g., activated on another host computer), (2) when the fraction of port numbers assigned to that middlebox service instance used by that particular middlebox service instance is lower than a threshold fraction (e.g., 0.3 or 0.2), the assigned range of port numbers will be reduced, or additional end machines will migrate to that host computer on which the middlebox service instance is executing (e.g., from the host computer on which the middlebox service instance is executing, the greater fraction of port numbers assigned to it), and (3) when the total number of connections being handled by the middlebox service instance is lower than the capacity (which is based on the number of middlebox service instances and the assigned range of port numbers), the middlebox service instance will be assigned to a threshold fraction of each of the middlebox service instance or will be disabled. Other policies may specify that port ranges be allocated based on the number of workload compute nodes that are serviced by a middlebox service instance (e.g., 256 port numbers are allocated for 0-10 workload compute nodes, 512 port numbers are allocated for 11-20 workload compute nodes, etc.). Those of ordinary skill in the art will appreciate that these policies are merely examples of possible policies, and that different policies are used in different embodiments depending on the requirements of the user.
After identifying (at 910) the middlebox service instances and determining (at 920) the port number ranges, the process 900 selects (at 930) at least one port range for allocation to each middlebox service instance. As described above, in some embodiments, the initial port range allocation allocates non-contiguous, non-overlapping port ranges for each middlebox service instance. In some embodiments, the subsequent allocation allocates at least one additional range of port numbers to specific middlebox service instances that use a number of port numbers above a threshold proportional fraction of the allocated port numbers. In some embodiments, other subsequent allocations remove a portion of the range of port numbers from the initial allocation to a particular middlebox service instance that uses less than a threshold number of port numbers in the initially allocated range of port numbers.
In some embodiments, the size of the range of port numbers assigned to middlebox service instances is fixed by an administrator based on the maximum number of middlebox service instances expected (e.g., for a maximum of 64 middlebox service instances expected, 64 different ranges of port numbers are created, each including 1024 ports, each range being assigned to one middlebox service instance at startup). In other embodiments, the size of the port number range is dynamic and may vary based on the number of active middlebox service instances, active connections, or workload compute nodes using middlebox services. The size of the port number range may also vary between middlebox service instances. For example, a larger range of port numbers is assigned to a first middlebox service instance executing on a host computer executing a larger number of workload compute nodes than a second middlebox service instance executing on a host computer executing a smaller number of workload compute nodes for the middlebox service, and may change as the number of workload compute nodes changes.
After selecting (at 930) the port number range to use for each middlebox service instance, process 900 generates (at 940) configuration data for implementing the desired middlebox service instance. In some embodiments, the configuration data includes multiple sets of configuration data for different network elements (e.g., host computers, gateway devices) and for different purposes. Fig. 10 illustrates a cluster 1040 of computer controllers in a data center 1005 sending different types of configuration data to different network elements. The elements of fig. 10 are generally identical to those discussed in fig. 1, with the addition of a local controller 1028 that receives configuration data from the controller cluster 1040. Fig. 10 illustrates a set of configuration data 1029 (received at local controller 1028) for each host computer 1020. In some embodiments, configuration data 1029 includes configuration information for: (1) configuring a middlebox service instance to provide middlebox services, (2) configuring other network elements (e.g., GM 1026 and MFE 1021) executing on the host computer to communicate with the middlebox service instance (e.g., 1024), (3) configuring the MFE or BGP instance executing on the host computer to advertise an IPv6 address associated with the middlebox service instance executing on the host computer. In some embodiments, local controller 1028 receives the configuration data and identifies the configuration data for each module executing on host computer 1020, as will be explained with respect to fig. 12. In some embodiments, the controller computer cluster 1040 also sends configuration data 1019 to the gateway device group for configuring the gateway device to perform IPv4 to IPv6 encapsulation, and in some embodiments, for configuring the gateway device with IPv6 routing table entries.
The configuration data (e.g., configuration data 1029) includes configuration data for configuring at least one middlebox service instance executing on at least one host computer to provide middlebox services using the assigned port number range. In some embodiments, the configuration data for initializing a new middlebox service instance on the host computer includes an IPv4 address (e.g., a source IP address that replaces packets from the first network to the external network) and an assigned port number range associated with a middlebox service used in performing middlebox service operations. In some embodiments, additional configuration information (e.g., logical overlay network elements to which the middlebox instance is connected) is sent to the host computer to configure other elements of the host to communicate with the new middlebox service instance, as will be appreciated by those of ordinary skill in the art.
In some embodiments, the additional configuration data sent to the host computer includes configuration data sent to the host computer to configure the host computer (or MFE or BGP instance executing on the host computer) to identify and advertise the IPv6 address prefix associated with the middlebox service instance, as described with respect to fig. 12. As discussed with respect to fig. 12, in some embodiments, the configuration data also includes information internal to the host computer for addressing middlebox service instances and configuring machines executing on the host computer to use middlebox service instances for particular packets (e.g., packets destined for external networks).
In some embodiments, the generated configuration data includes configuration data (e.g., configuration data 1019) generated for provision to the gateway device. In some embodiments, the gateway device is a partially or fully programmable gateway device that is programmable by the controller computer cluster to effect IPv 4-to-IPv 6 translation and encapsulation according to PBR rules specified based on the IPv4 address in the IPv4 header and the destination port. In other embodiments, the gateway device is an off-the-shelf gateway device (e.g., a dual stack router) capable of simple programming sufficient to configure the gateway device to implement IPv4 to IPv6 encapsulation.
For both programmable gateway devices and off-the-shelf gateway devices, the configuration data includes what will be referred to as a set of middlebox service records and IPv6 routing table entries. In some embodiments, the middlebox service record maps a combination of an IPv4 address and a destination port number used by a particular middlebox service operation to an IPv6 destination address. In some embodiments, the middlebox service record is provided as a lookup table and instructions for using the lookup table to route data messages using the IPv4 address used by the distributed middlebox service. In some embodiments, the middlebox service record is a PBR rule (or similar rule or policy) that defines an algorithm for generating an IPv6 address from an IPv4 destination address and a port number. In some embodiments, the PBR rules specify an IPv4 destination address to which the algorithm should be applied, while in other embodiments both an IPv4 address and a port number are specified. In some embodiments, the middlebox service record is instructions for configuring the off-the-shelf gateway device to perform IPv6 encapsulation according to a specified algorithm on IPv4 packets destined for IPv4 used by a particular middlebox service operation. In some embodiments, the instructions are based on functionality (e.g., exposed APIs) provided by the off-the-shelf gateway device.
In some embodiments, each IPv6 routing table entry identifies an IPv6 address prefix associated with a particular host computer of a set of multiple host computers (each host computer executing a middlebox service instance) and a next-hop interface for reaching the particular host computer. The IPv6 address prefix specified for a particular host computer in the IPv6 routing entry is based on the IPv4 address associated with the distributed middlebox service and the range of port numbers assigned to the distributed middlebox service instance executing on the host computer. If multiple non-adjacent port ranges are assigned to a particular host computer, the set of IPv6 routing table entries includes multiple entries for the particular host computer.
After the configuration data is generated (at 940), the configuration data generated for each network element is forwarded (at 950) to the appropriate network element for configuring the network element as described with respect to fig. 12 and 11. In some embodiments, the configuration data is received at the host computer by a local controller (e.g., local controller 1028) or local controller agent that communicates with the controller computer cluster using control plane messages. The local controller then provides configuration data or configures elements on the host computer to implement the middlebox service (e.g., instantiate a middlebox service instance, configure the GM to use the middlebox service instance, and configure the MFE to advertise an IPv6 address prefix associated with the middlebox service instance, etc.). Configuration data generated for the gateway device is forwarded to the gateway device to configure the gateway device to identify a particular host computer associated with a particular received packet (e.g., by using the provided IPv6 routing table entry). After forwarding (at 950) the configuration data, the process ends. Those of ordinary skill in the art will appreciate that in some embodiments, process 900 is performed for each distributed middlebox service that uses the same IPv4 address at each of a plurality of distributed middlebox service instances as a source address for outgoing packets.
In some embodiments, the controller computer cluster periodically or based on a schedule monitors the load on middlebox service instances and middlebox services in the aggregate. In some embodiments, the monitoring is based on a program executing on the same host computer as the middlebox service instance. In some embodiments, the program monitors a set of metrics (e.g., delay, number of connections processed, number of packets per second, number of end machines using the middlebox service instance, etc.) associated with the middlebox service instance. In some embodiments, operations 910 and 920 are performed each time a request is made to initialize a new middlebox service instance or workload machine. In some embodiments, operations 910 and 920 are also performed periodically or based on a schedule set by an administrator to determine whether monitoring information indicates that there has been a change in the size of the port number range or any port number range that needs to be reassigned. If such a change occurs, operations 930-950 are performed to update the allocation of port ranges and provide updated configuration data to the network element.
Figure 11 conceptually illustrates a process 1100 performed by a gateway device to facilitate providing middlebox services based on received configuration data. Process 1100 begins by receiving (at 1110) configuration data for facilitating provision of middlebox services in an internal network. As discussed with respect to fig. 9 and 12, the configuration data in some embodiments is received from the controller computer cluster (e.g., configuration data 1019) or as an advertisement of availability of an IPv6 address prefix (e.g., advertisement 1039). In some embodiments, the controller cluster provides a portion of the configuration data while the advertisement includes a different portion of the configuration data. For example, in some embodiments, the controller computer provides configuration data related to middlebox service records as described with respect to fig. 9, while IPv6 routing table entry configuration data is received from the host computer through an advertisement of IPv6 address prefixes available at the host computer, as described with respect to fig. 12. In embodiments where the advertisement made by the host computer provides configuration information for the routing table entry, the gateway device receives additional configuration information in the form of an additional advertisement of the IPv6 address prefix of the new middlebox service instance when the new middlebox service instance comes online.
Based on the received configuration data (at 1110), the gateway device creates (at 1120) at least one IPv6 routing table entry for the received IPv6 address prefix. In some embodiments, the routing table entry is a static entry provided by the controller computer cluster, which may be updated by the controller computer cluster when the configuration data changes. In other embodiments, the routing table entries are dynamic routing table entries created based on BGP or other route learning protocols known to those of ordinary skill in the art.
Process 1100 also creates (at 1130) a middlebox service record based on the configuration data. As discussed with respect to fig. 9 and 12, the middlebox service record may be any type of record or rule that identifies a packet destined for an IPv4 address associated with the middlebox service as requiring IPv6 encapsulation and enables the gateway device to identify the correct IPv6 destination address for reaching the middlebox service instance associated with the packet. For example, in some embodiments, the middlebox service record is any of the following: (1) a PBR rule identifying a packet destined for an IPv4 address associated with a middlebox service as requiring IPv6 encapsulation and specifying a method for generating an IPv6 destination, (2) a set of records for a lookup table identifying IPv6 destination addresses for a set of combinations of IPv4 addresses and destination ports, or (3) API instructions for APIs exposed by an off-the-shelf gateway device for implementing a programmed encapsulation of IPv4 packets destined for an IPv4 address associated with a middlebox service into IPv6 packets. Those of ordinary skill in the art will appreciate that the creation of the routing table entry (at 1120) and the creation of the middlebox service record (at 1130) may be performed simultaneously or in reverse order from that described in process 1100. After the routing table entry and middlebox service record have been created, the flow ends. However, one of ordinary skill in the art will appreciate that process 1100, or a portion thereof, is performed each time new configuration data is received at the gateway device. For example, each time the host computer advertises a new IPv6 address prefix will result in a new IPv6 routing table entry being created in the gateway device.
In some embodiments, the host computer is configured to advertise the availability of an IPv6 address prefix based on IPv4 of the middlebox service instance serving as a source address for packets from the first network to the external network and a range of port numbers assigned to the middlebox service instance.
Figure 12 conceptually illustrates a process 1200 for configuring a host computer to execute a distributed middlebox service instance and advertise an IPv6 address prefix associated with the middlebox service instance executing on the host computer. In some embodiments, process 1200 is performed by a host computer (e.g., host computer 1020) executing a local controller (e.g., local controller 1028) and BGP instance (e.g., MFE 1021 or routing machine 1023).
Process 1200 begins by receiving (at 1210) configuration information related to a middlebox service instance executing on a host computer. In some embodiments, the configuration information related to the middlebox service instance includes (1) an IPv4 address, a source address used by the middlebox service to replace packets sent from within the first network to machines in the external network, and (2) a port address range assigned to the middlebox service instance executing on the host computer. In other embodiments, the configuration information also includes information internally used to route the packet to the middlebox service instance, such as IP and MAC addresses or next hop interfaces used to forward the packet to the middlebox service instance. In some embodiments, the IPv6 address associated with the middlebox service instance is also included in the configuration information. In some embodiments, the configuration information is received from a controller computer in a cluster of controller computers that configure elements of the first network.
After receiving (at 1210) the configuration information, process 1200 identifies (at 1220) configuration data for configuring different components (e.g., machines, MFEs, filters, containers, pod, etc.) executing on the host computer. For example, the middlebox service instance requires configuration data that includes an external IP address associated with the middlebox service, a port range assigned to the middlebox service instance, and in some embodiments, an IP and Media Access Control (MAC) address associated with the middlebox service instance. In some embodiments, the BGP instance (e.g., MFE 1021 or routing machine 1023) needs to know the IP address associated with the middlebox service and the port range assigned to the middlebox service instance. In other embodiments, IPv6 address prefixes are provided to BGP instances for advertising. Other network elements need to be configured with information for interacting with the middlebox service instance, such as policies for identifying packets requiring middlebox services, MAC addresses associated with the middlebox service instance, and other information that will be apparent to those of ordinary skill in the art.
After the configuration data for each component, process 1200 configures (at 1230) the middlebox service instance with the IP address and port range associated with the middlebox service instance. If the middlebox service instance is being configured for the first time (i.e., being instantiated), the configuration data includes additional information such as IP and MAC addresses associated with the middlebox service instance. In some embodiments, when the middlebox service instance is instantiated, the configuration data includes only the IP address associated with the middlebox service, and the subsequent updates include only the updated port ranges.
After configuring (at 1230) the middlebox service instance, process 1200 provides (at 1240) the BGP instance executing on the host computer with an IPv6 address prefix identified as being associated with the middlebox service instance based on the configuration information. In some embodiments, the identified IPv6 address prefix is based on an IPv4 address used by the middlebox service included in the configuration information and a range of port numbers assigned to the middlebox service instance. In some embodiments, the assigned port number range is a port number range that shares a common set of leftmost bits. For example, the 1024 port number ranges allocated are numbers sharing 6 bits common to the leftmost side of the port number, and the 512 port number ranges allocated are numbers sharing 7 bits common to the leftmost side. In such an embodiment, the IPv6 address prefix associated with the middlebox service instance is then identified as an IPv6 address prefix in which the rightmost valid bit is a set of common bits in the port number assigned to the middlebox service instance. In some embodiments, the advertised IPv6 address is based on existing functionality of the hardware gateway device for handling IPv6 encapsulation of IPv4 packets.
Fig. 13 illustrates three different exemplary advertised IPv6 address prefixes 1331-1333 and corresponding exemplary addresses 1341-1343, the IPv6 address prefixes 1331-1333 being used in different embodiments to advertise availability of services at the host computer, the exemplary addresses 1341-1343 being generated by the gateway device for use in an IPv6 encapsulation header to forward the packet to a particular service instance executing on the host computer making the advertisement. FIG. 13 shows a set of identified IPv6 address prefixes 1331-1333 that are located in the FC 00:1/8 address block based on the configuration of the middlebox service instance. In some embodiments, other IPv6 prefixes are used for other address blocks, such as FD 00:1/8 address blocks. The illustrated IPv6 includes common bits (i.e., the first 6 bits in the illustrated example) of the IPv4 address 1310 and port number range 1315 used by the middlebox service. The illustrated example is for a middlebox service that uses an IPv4 address 1310 of 192.168.32.1 (or C6A8:2001 in hexadecimal) and a port range 1315 of 0-1023 (with 6 common "0" bits). This example also illustrates a user-configured prefix 1320 (e.g., F462:5D1C: A451:2 BD6) that, in some embodiments, is used to distinguish packets received as different tenants or logical networks implemented in the same data center. In some embodiments, the user-configured 64-bit prefix is randomly generated such that common prefixes generated for different tenants are unlikely to be identical.
An exemplary IPv6 prefix 1331 (i.e., FCC6: A820: 0100::/46) is generated using the first 8 bits of the FC00:: 8 prefix, followed by 32 bits of the serving IPv4 address 1310, and finally a port range prefix common to all port numbers in the port range 1315. Similarly, an IPv6 prefix 1332 (i.e., FD00: C6A8: 2001:/54) is generated using the first 16 bits of the FD 00:/8 prefix, followed by 32 bits of the serving IPv4 address 1310, and finally a port range prefix common to all port numbers in the port range 1315. Alternatively, an IPv6 prefix 1333 (i.e., FC00: < user-configured 64-bit prefix >: C6A 8:2001:/118) is generated using the first 16 bits of the FC00::/8 prefix, followed by a user-configured 64-bit prefix 1320, followed by 32 bits of the serving IPv4 address 1310, and finally by a port range prefix common to all port numbers in the port range 1315.
Fig. 13 shows IPv6 destination addresses 1341-1343 for exemplary received packets having header values 1302 corresponding to advertised IPv6 prefixes 1331-1333. An exemplary IPv6 destination address 1341 (i.e., FCC 6:820:0100:0000:01::) is generated by the gateway device using the first 8 bits of the FC00::/8 prefix, followed by 32 bits of the destination IPv4 address 1350, and finally the destination port number 1355. Similarly, an IPv6 destination address 1342 (i.e., FD00: C6A8:2001:0000: 0001::) is generated using the first 16 bits of the FD00::/8 prefix, followed by 32 bits of the destination IPv4 address 1350, and finally by the destination port number 1355. Alternatively, the IPv6 destination address 1343 (i.e., FC00: F462:5D1C: A451:2BD6: C6A8:2001::01: 0000) is generated using the first 16 bits of the FC 00:/8 prefix, followed by the user-configured 64-bit prefix 1320, followed by 32 bits of the destination IPv4 address 1350, and finally by the destination port number 1355.
The BGP instance then advertises the provided IPv6 address prefix associated with the middlebox service instance to the gateway device as available at the host computer. In some embodiments, the advertisement includes instructions to the gateway device to identify an IPv6 address based on an IPv4 address and a port number of the packet received at the gateway device and encapsulate the IPv4 packet with the identified IPv6 address. In other embodiments, the advertised IPv6 address is based on either (1) existing functionality of the hardware gateway device for handling IPv6 encapsulation of IPv4 packets, or (2) a central controller cluster that configures the gateway device to perform IPv6 encapsulation based on the IPv4 address and port number, respectively. In these other embodiments, specific configuration instructions from the BGP instance are not required. In some embodiments, the advertisement uses Border Gateway Protocol (BGP). In other embodiments, other proprietary or non-proprietary (IS-IS, OSPF, FRR, etc.) route advertisement protocols or methods are used that inform other routers of the availability of a particular network address at a host computer. In some embodiments, the advertisement is made to a routing server (or routing reflector) that identifies the advertised IPv6 address as available at the host computer to the gateway device and other forwarding elements of the internal network.
In some embodiments, the routing server also uses BGP advertisements to identify the advertised IPv6 address as available at the host computer and to simplify the exchange of routing information in the network by minimizing the number of peers (peers) between different network elements that are necessary to propagate the routing information in the full-mesh configuration. However, one of ordinary skill in the art will appreciate that other methods of advertising the availability of an IP address may be used. Those of ordinary skill in the art will appreciate that the description of BGP instances and BGP protocols is merely one example or protocol for advertising routing information and is used herein as an example only.
Process 1200 then continues with configuring (at 1250) other components (e.g., machines, MFEs, filters, containers, pod, etc.) executing on the host computer based on the received configuration data. In some embodiments, the other components are configured to direct traffic to the middlebox service instance based on policies included in the configuration data that identify traffic destined for the external network. Those of ordinary skill in the art will appreciate that the order in which components are configured presented above is not the only possible order, and in other embodiments, components are configured in parallel (e.g., simultaneously) or in an order different from the order presented above.
Fig. 14 conceptually illustrates a set of data exchanges between network elements to migrate a VM (e.g., a Guest Virtual Machine (GVM)) that communicates with an external machine using a dsat middlebox service. Fig. 14 shows a controller computer 1440 that initiates the migration of the VM, a source host computer 1420a on which the VM originally executed, the VM being migrated to a destination host computer 1420b, and a routing server 1430 that is used by the host computers as a central BGP server to propagate routing information (e.g., IP address prefixes available at each host computer). The data exchange begins (at 1410) with the controller computer initiating a VM migration by sending configuration data to each of the source host computer 1420a and the destination host computer 1420b identifying that the VM is being migrated.
The source host computer then identifies (at 1411) a set of ports for a set of middlebox services used by the active connections of the migrated VM to notify the destination host computer 1420b of the migrated ports. The set of ports identified to the destination host computer 1420b includes NAT records associating the identified ports with internal IPv4 addresses and ports used by the migrated VM. In some embodiments, the identified ports are placed into the list of unused ports at the source host computer 1420a until the migrated VM releases them (e.g., when the active connection using the identified ports ends). In some embodiments, the source host computer 1420a also adds these port numbers to the redirection table to redirect any packets received for that port number to the destination host computer 1420b. In some embodiments where the port number is reused for a different source and destination IP address pair, the redirect instruction specifies a particular communication destined for the migrated VM, rather than redirecting all packets using the particular port number.
Based on the identified port number, in some embodiments, the destination host computer advertises (at 1412) a set of IPv6 addresses (e.g., which is FC00: IPv4 address: port::/64 IP address prefix) for the particular migrated port number, which would be the longest matching prefix at the gateway device, and replaces the existing prefix (e.g.,/54 prefix) originally advertised by the source host computer 1420a without requiring further advertisement by the source host computer 1420a. Those of ordinary skill in the art will appreciate that in some embodiments in which the same port number is reused for different source and destination IP address pairs, the host computer to which the VM is migrated will not make an advertisement and will instead rely on redirection at the host computer from which the VM is migrated. As each migrated port is "released" (i.e., the active connection using that port number ends), the destination host computer 1420b notifies (at 1413) the source host computer 1420a that the port has been released and may be used by the source host computer 1420a for a new connection. In some embodiments, the source host computer then removes the freed port from the list of unused ports.
In some embodiments, the destination host computer also advertises (at 1413) that the particular IPv6 address prefix (e.g., the/64 prefix) is no longer available at the destination host computer 1420 b. In some embodiments, the source host computer 1420a additionally advertises (at 1414) that a particular IPv6 address prefix (e.g.,/64 prefix) is available at the source host computer 1420a. Those of ordinary skill in the art will appreciate that in embodiments that reuse the same port number for different source and destination IP address pairs, subsequent advertising (at 1413 and 1414) as described is unnecessary because the host computer to which the VM is migrated does not make an advertisement for an IPv6 address. In some embodiments, after the destination host computer 1420b withdraws the IPv6/64 address, the source host computer 1420a omies advertising (at 1414) of the IPv6/64 address prefix associated with the released port and relies on the IPv6 address prefix (e.g., the/54 prefix associated with the assigned port number range) that overrides the particular IPv6 address to direct the packet to the source host computer 1420a. In some embodiments, the destination host computer 1420b sends the information and advertises the unavailability of each released port in relation to the released port as the end of the connection using that port. Those of ordinary skill in the art will appreciate that VM migration is used as an example, but that the set of data exchanges described above may be used to migrate any similar computing node (e.g., container, pod, etc.).
Many of the above features and applications are implemented as a software process that is specified as a set of instructions recorded on a computer-readable storage medium (also referred to as a computer-readable medium). When executed by one or more processing units (e.g., one or more processors, cores of processors, or other processing units), cause the processing unit(s) to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, CD-ROM, flash memory drives, RAM chips, hard drives, EPROMs, and the like. Computer readable media does not include carrier waves and electronic signals transmitted wirelessly or through a wired connection.
In this specification, the term "software" is meant to include firmware residing in read-only memory or applications stored in magnetic storage which can be read into memory for processing by a processor. Furthermore, in some embodiments, multiple software inventions may be implemented as sub-portions of a larger program while retaining different software inventions. In some embodiments, multiple software inventions may also be implemented as separate programs. Finally, any combination of separate programs that collectively implement the software invention described herein is within the scope of the present invention. In some embodiments, one or more specific machine implementations are defined that execute and run the operations of the software program when the software program is installed to run on one or more electronic systems.
FIG. 15 conceptually illustrates a computer system 1500 with which some embodiments of the invention are implemented. Computer system 1500 can be used to implement any of the host computers, controllers, and managers described above. It can therefore be used to perform any of the above-described processes. Such computer systems include various types of non-transitory machine-readable media and interfaces for various other types of machine-readable media. Computer system 1500 includes a bus 1505, a processing unit(s) 1510, a system memory 1525, a read only memory 1530, a persistent storage device 1535, an input device 1540, and an output device 1545.
Bus 1505 collectively represents all system, peripheral devices, and chipset buses that communicatively connect the numerous internal devices of computer system 1500. For example, bus 1505 communicatively connects processing unit(s) 1510 with read-only memory 1530, system memory 1525, and persistent storage 1535.
From these various memory units, processing unit(s) 1510 retrieve instructions to be executed and data to be processed to perform the processes of the present invention. In different embodiments, the processing unit(s) may be a single processor or a multi-core processor. Read Only Memory (ROM) 1530 stores static data and instructions required by processing unit(s) 1510 and other modules of the computer system. On the other hand, persistent storage 1535 is a read-write memory device. This device is a non-volatile memory unit that stores instructions and data even when computer system 1500 is turned off. Some embodiments of the invention use a mass storage device (such as a magnetic or optical disk and its corresponding disk drive) as the persistent storage device 1535.
Other embodiments use removable storage devices (such as floppy disks, flash drives, etc.) as the permanent storage device. Like persistent storage 1535, system memory 1525 is a read-write memory device. However, unlike storage device 1535, system memory 1525 is a volatile read-write memory, such as random access memory. The system memory stores some of the instructions and data that the processor needs at runtime. In some embodiments, the processes of the present invention are stored in system memory 1525, persistent storage 1535 and/or read only memory 1530. Processing unit(s) 1510 retrieve instructions to execute and data to process from these various memory units to perform processes of some embodiments.
Bus 1505 also connects input device 1540 and output devices 1545. Input device 1540 enables a user to communicate information to the computer system and to select requests to the computer system. Input devices 1540 include an alphanumeric keyboard and pointing device (also referred to as a "cursor control device"). An output device 1545 displays the image generated by the computer system. The output devices include printers and display devices, such as Cathode Ray Tubes (CRTs) or Liquid Crystal Displays (LCDs). Some embodiments include devices that function as both input and output devices, such as touch screens.
Finally, as shown in FIG. 15, bus 1505 also couples computer system 1500 to network 1565 through a network adapter (not shown). In this manner, the computer may be part of a computer network (such as a local area network ("LAN"), a wide area network ("WAN") or an intranet) or a network of networks (such as the Internet). Any or all of the components of computer system 1500 may be used in conjunction with the present invention.
Some embodiments include electronic components, such as microprocessors, storage, and memory, that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as a computer-readable storage medium, machine-readable medium, or machine-readable storage medium). Some examples of such computer readable media include RAM, ROM, compact disk read-only (CD-ROM), compact disk recordable (CD-R), compact disk rewriteable (CD-RW), digital versatile disk read-only (e.g., DVD-ROM, dual layer DVD-ROM), multiple recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini SD cards, micro SD cards, etc.), magnetic and/or solid state disk drives, read-only and recordable Blu-Ray
Figure BDA0004113863590000371
Optical discs, super-density optical discs, any other optical or magnetic medium, and floppy disks. The computer readable medium may store a computer program executable by at least one processing unit and include a set of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as produced by a compiler, and files containing higher level code that are executed by a computer, electronic component, or microprocessor using an interpreter.
Although the discussion above refers primarily to microprocessors or multi-core processors executing software, some embodiments are performed by one or more integrated circuits, such as Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs). In some embodiments, such integrated circuits execute instructions stored on the circuits themselves.
As used in this specification, the terms "computer," "server," "processor," and "memory" all refer to electronic or other technical devices. These terms do not include a person or group of people. For purposes of this specification, the term "display" or "displaying" refers to displaying on an electronic device. The terms "computer-readable medium," "plurality of computer-readable media," and "machine-readable medium" as used in this specification are entirely limited to tangible physical objects that store information in a computer-readable form. These terms do not include any wireless signals, wired download signals, and any other transitory or temporary signals.
Although the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be practiced in other specific forms without departing from the spirit of the invention. For example, several figures conceptually illustrate a process. The specific operations of these processes may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, a process may be implemented using several sub-processes, or as part of a larger macro process. It is therefore to be understood that the invention is not to be limited by the foregoing illustrative details, but is to be defined by the appended claims.

Claims (44)

1. A method at a gateway device to facilitate source network address translation (snap) middlebox service operations for a first network, the method comprising:
receiving a packet from a second network comprising an internet protocol version 4 (IPv 4) header, the IPv4 header having a first destination IPv4 address and a first destination port;
identifying a SNAT record mapping a first destination IPv4 address and port to a second destination IP address and second destination port for inclusion in an IP version 6 (IPv 6) header;
encapsulating the received packet with an IPv6 header using the second destination IPv6 address and port;
forwarding the encapsulated packet along the first network to have an SNAT middlebox service instance within the first network receive the packet with the IPv4 header and replace the first destination IP address and port with a third destination IP address and third destination port so that the packet can then be provisioned to a destination machine connected to the first network.
2. The method of claim 1, wherein forwarding the encapsulated packet comprises: forwarding the encapsulated packet to a host computer on which the SNAT middlebox service instance executes, and wherein the host computer removes IPv6 encapsulation and forwards the received packet to the SNAT middlebox service instance based on a first destination IPv4 address.
3. The method of claim 2, wherein forwarding the encapsulated packet to the host computer is based on a routing entry in an IPv6 routing table of the gateway device, the routing entry created based on the host computer advertising a second destination IP address as available at the host computer.
4. The method of claim 3, wherein the advertised IPv6 address is an IPv6 address prefix based on a first destination IPv4 address, the first destination IPv4 address being used by the SNAT middlebox service instance to replace a source address of a packet sent within a first network to which the SNAT middlebox service is provided from the SNAT middlebox service instance.
5. The method of claim 4, wherein
The SNAT middlebox service is a distributed SNAT (dSNAT) middlebox service implemented by a plurality of dSNAT middlebox service instances executing on a plurality of host computers and using a first destination IPv4 address to provide the dSNAT middlebox service for packets sent from within a first network,
each dsat middlebox service instance is assigned a non-overlapping range of port numbers for use in providing the dsat middlebox service, and
Each host computer advertises a different IPv6 address prefix based on a first destination IPv4 address and the range of port numbers assigned to the dsat middlebox service instance executing on the host computer.
6. The method of claim 1, wherein the snap record is a record identifying a packet destined for a first destination IPv4 address as requiring the identifying and encapsulating operation to forward the packet using an identified second destination IP address.
7. The method of claim 1, wherein the second destination IP address is an IPv6 address comprising a first destination IPv4 address and a set of bits for the first destination port.
8. The method of claim 7, wherein the IPv6 address comprises a first set of bits indicating that the address is not necessarily globally unique, a second set of bits comprising the IPv4 address, and a third set of bits comprising a set of bits belonging to a range of port numbers assigned to the SNAT middlebox instance.
9. The method of claim 1, wherein the snap record is received from a controller computer of a first network.
10. The method of claim 1, wherein a set of IPv6 routing entries associated with a set of SNAT middlebox service instances are received from a controller computer for inclusion in an IPv6 routing table of the gateway device, and forwarding the encapsulated packet comprises performing a lookup in the IPv6 routing table to identify a next hop for forwarding the encapsulated packet to the SNAT middlebox service instance.
11. A method that facilitates providing a distributed source network address translation (dsat) middlebox service at a host computer of a first network, the dsat middlebox service being implemented by a plurality of dsat middlebox service instances executing on a plurality of host computers, each dsat using a same external IPv4 address as a source address of a serviced packet, the method comprising:
receiving a packet from a gateway device executing between a first network and a second network, the packet comprising an internal IPv4 packet and an internet protocol version 6 (IPv 6) encapsulation header, the IPv6 encapsulation header having a first destination IP address and a first destination port number;
removing the IPv6 encapsulation header to identify a dsat middlebox service instance executing on the host computer that is associated with a second destination IP address and a second destination port number in an IPv4 header of the internal IPv4 packet;
Forwarding the internal IPv4 packet along the first network to an identified dsat middlebox service within the first network to receive the packet with the IPv4 header and replace the second destination IP address and the second port number with a third destination IP address and port number so that the packet can then be supplied to a destination machine connected to the first network.
12. The method of claim 11, wherein the internal IPv4 packet is sent as an IPv4 packet by a source device in a second network using a second IP address and port number as a destination IP address and destination port number in the IPv4 packet header, and the gateway device receives the IPv4 packet for forwarding to the dsat middlebox service instance in a first network.
13. The method of claim 12, wherein the gateway device generates the IPv6 encapsulation header and forwards the encapsulated packet to the host machine based on a first destination IP address in the IPv6 encapsulation header.
14. The method of claim 13, wherein the gateway device identifies a first destination IP address in the IPv6 encapsulation header based on a second IP address and a port number in the IPv4 header.
15. The method of claim 14, wherein the gateway device is configured by a controller computer in a first network to (1) identify a first IP destination address, (2) encapsulate an IPv4 packet destined for a second IP address with the IPv6 header using the first IP destination address, and (3) forward a data message destined for the first IPv6 address to the host computer.
16. The method of claim 11, further comprising: an IPv6 address prefix including a first IP destination address is advertised to the gateway device as being available at the host computer to cause the gateway device to forward packets destined for the IPv6 destination address to the host computer, the advertised IPv6 address prefix being a longest matching prefix for the IPv6 destination address.
17. The method of claim 16, further comprising identifying the IPv6 address prefix based on: (1) An IPv4 address that is used by the dsat middlebox service instance as a source IP address for outgoing packets, and (2) a range of port numbers that are assigned to the dsat middlebox service instance to be used as a source port for outgoing packets.
18. The method of claim 16, wherein advertising the IPv6 address prefix to the gateway device comprises: the IPv6 address prefix is advertised to a routing reflector as being available at the host computer, which in turn advertises the IPv6 address prefix to the gateway device as being available at the host computer.
19. The method of claim 16, wherein advertising the IPv6 address prefix to the gateway device further comprises information for configuring the gateway device to perform IPv6 encapsulation on data messages destined for a second IP address.
20. The method of claim 11, wherein the packet is a first packet in a packet flow between a device in a second network and the destination machine, the method further comprising:
receiving a second packet in the packet stream from a source machine, the source machine being the destination machine of a first packet, the first packet having a first source IP address identical to a second destination IP address and a first source port identical to a second destination source port; and is also provided with
Forwarding the data message to the device in the second network without encapsulating the packet using the first IP destination address as a source IP address or a destination IP address,
wherein the source machine of the destination machine as the first packet sends a second packet having a second source IP address identical to the third destination IP address and a second source port identical to the third destination source port, the second source IP address and the second source port being replaced with the first source IP address and the source port by a dsat service operation performed by the identified dsat middlebox instance.
21. A method to facilitate providing distributed source network address translation (dsat) middlebox services for a first network at a dsat middlebox service instance executing on a host computer, the method comprising:
receiving configuration information relating to the dsat middlebox service, the configuration information including (i) a public internet protocol version 4 (IPv 4) address for each instance of the dsat middlebox service to use as a source IP address for data messages sent from a first network through the dsat middlebox service instance, and (2) a range of port numbers assigned to the dsat middlebox service instance executing on the host computer;
identifying an IP version 6 (IPv 6) address prefix for advertising the dsat middlebox service instance provided at the host computer based on the received IPv4 address and the assigned port number range;
advertising, to a gateway device providing access to a network external to the first network, the identified IPv6 address prefix as reachable at the host computer to direct packets received at the gateway device from an external network to the dsat middlebox service instance executing on the host computer using the public IPv4 address and a port number in the range of port numbers allocated to the dsat middlebox instance.
22. The method of claim 21, wherein the host computer is a first host computer and the dsat middlebox service instance executing on the first host computer is a first dsat middlebox service instance, the identified IPv6 address prefix is a first IPv6 address prefix, wherein
The second dsat middlebox service instance executes on the second host computer and uses the public IPv4 address as a source address for data messages sent out of the first network through the second dsat middlebox service instance,
the second dsat middlebox service instance is assigned a different range of port numbers than the first dsat middlebox service instance, and
the second host computer advertises a second IPv6 address prefix that is different from the first IPv6 address prefix as reachable at the second host computer.
23. The method of claim 21, wherein the configuration information is received from a controller computer that provides configuration data to the dsat middlebox service instance to configure the dsat middlebox service instance to use the public IPv4 address and the assigned port number range.
24. The method of claim 21, wherein,
Identifying the IPv6 address prefix includes: generating the IPv6 address prefix to begin with a first number of bits common to all of the dsat instances, the common bits including the public IPv4 address used as a source IP address by each dsat instance and a second number of bits representing the range of port numbers allocated to the dsat middlebox service instance, and
advertising the identified IPv6 address prefix includes advertising the IPv6 address prefix and a prefix length.
25. The method of claim 24, wherein the bits representing the range of port numbers are a set of leftmost bits common to all port numbers in the assigned range for each port number in the range of port numbers.
26. The method of claim 21, wherein advertising the IPv6 address prefix to the gateway device as reachable at the host computer comprises: the IPv6 address prefix is advertised to a routing reflector as reachable at the host computer, which then advertises the IPv6 address prefix to the gateway device as reachable at the host computer.
27. The method of claim 26, wherein a Managed Forwarding Element (MFE) executing on the host computer receives the configuration information, identifies the IPv6 address prefix, and advertises the identified IPv6 address prefix.
28. The method of claim 27, wherein the MFE is a Border Gateway Protocol (BGP) instance and the advertisement comprises a BGP advertisement.
29. The method of claim 28, wherein the dsat middlebox service is a first dsat middlebox service and the host computer facilitates providing a second dsat middlebox service of a second network at a second dsat middlebox service instance executing on the host computer, the configuration information is first configuration information, the public IPv4 address is a first IPv4 address, the port number range is a first port number range, and the identified IPv6 address is a first IPv6 address, the method further comprising:
receiving second configuration information relating to a second dsat middlebox service, the configuration information including (i) a second internet protocol version 4 (IPv 4) address for each instance of the second dsat middlebox service to be used as a source IP address for data messages sent from a second network through the second dsat middlebox service instance, and (2) a second range of port numbers assigned to the second dsat middlebox service instance executing on the host computer;
identifying a second IP version 6 (IPv 6) address prefix for advertising a second dsat middlebox service instance provided at the host computer based on the received second IPv4 address and the assigned second port number range;
The identified second IPv6 address prefix is advertised as reachable at the host computer to a gateway device that provides access to a network external to the second network.
30. The method of claim 26, wherein a set of forwarding elements in an intermediate architecture between the gateway device and the host computer associate the advertised IPv6 address prefix with a path from the gateway device to the host computer based on the advertisement.
31. The method of claim 21, wherein the port number range is determined by a controller computer providing the configuration information related to the dsat middlebox service instance.
32. The method of claim 31, wherein the range of port numbers is a first range of port numbers, the identified IPv6 address prefix is a first IPv6 address prefix, and the controller computer subsequently determines a second range of port numbers allocated to the dsat middlebox service instance, the method further comprising:
receiving a second range of port numbers assigned to the dsat middlebox service instance executing on the host computer;
Identifying a second IPv6 address prefix for advertising the dsat middlebox service instance provided at the host computer based on the received IPv4 address and a second range of port numbers assigned to the dsat middlebox service instance;
the identified second IPv6 address prefix is advertised to the gateway device as reachable at the host computer.
33. The method of claim 32, wherein the first port number range is based on a first number of host computers executing dsat middlebox service instances of the dsat middlebox service and the second port number range is based on a second number of host computers executing dsat middlebox service instances of the dsat middlebox service.
34. The method of claim 33, wherein the first port number range includes more port numbers than the second port number range, and the second number of host computers executing dsat middlebox service instances of the dsat middlebox service is greater than the first number of host computers executing dsat middlebox service instances of the dsat middlebox service.
35. The method of claim 33, wherein the second port number range includes more port numbers than the first port number range, and the second number of host computers executing dsat middlebox service instances of the dsat middlebox service is less than the first number of host computers executing dsat middlebox service instances of the dsat middlebox service.
36. The method of claim 33, wherein a first number of host computers is below a threshold number for reassigning port number ranges and a second number of host computers is above the threshold number for reassigning port number ranges.
37. The method of claim 36, wherein each host computer executing a dsat middlebox service instance of the dsat middlebox service is assigned a range of port numbers covering a same number of port numbers, the first range of port numbers based on dividing an entire range of possible port numbers into a first range of port numbers number equal to the threshold number, and the second range of port numbers based on dividing the entire range of possible port numbers into a second range of port numbers number greater than a second number of host computers executing dsat middlebox service instances of the dsat middlebox service.
38. The method of claim 32, wherein a first port number range is based on a first number of workload machines executing on the host computer for which the dsat middlebox service instance provides the dsat middlebox service, and a second port number range is based on a second number of workload machines executing on the host computer for which the dsat middlebox service instance provides the dsat middlebox service.
39. The method of claim 38, wherein the second range of port numbers includes more port numbers than the first range of port numbers, and the second number of workload machines is greater than the first number of workload machines.
40. The method of claim 38, wherein the second range of port numbers includes fewer port numbers than the first range of port numbers, and the second number of workload machines is less than the first number of workload machines.
41. A machine readable medium storing a program which, when implemented by at least one processing unit, implements the method of any one of claims 1-40.
42. An electronic device, comprising:
a set of processing units; and
a machine readable medium storing a program which, when implemented by at least one of the processing units, implements the method of any one of claims 1-40.
43. A system comprising means for implementing the method of any one of claims 1-40.
44. A computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-40.
CN202180061371.9A 2020-07-16 2021-05-01 Facilitating distributed SNAT services Pending CN116158064A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US16/931,207 2020-07-16
US16/931,196 US11616755B2 (en) 2020-07-16 2020-07-16 Facilitating distributed SNAT service
US16/931,196 2020-07-16
US16/931,207 US11606294B2 (en) 2020-07-16 2020-07-16 Host computer configured to facilitate distributed SNAT service
PCT/US2021/030371 WO2022015394A1 (en) 2020-07-16 2021-05-01 Facilitating distributed snat service

Publications (1)

Publication Number Publication Date
CN116158064A true CN116158064A (en) 2023-05-23

Family

ID=76076480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180061371.9A Pending CN116158064A (en) 2020-07-16 2021-05-01 Facilitating distributed SNAT services

Country Status (3)

Country Link
EP (1) EP4078935A1 (en)
CN (1) CN116158064A (en)
WO (1) WO2022015394A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10742746B2 (en) 2016-12-21 2020-08-11 Nicira, Inc. Bypassing a load balancer in a return path of network traffic
US11606294B2 (en) 2020-07-16 2023-03-14 Vmware, Inc. Host computer configured to facilitate distributed SNAT service
US11616755B2 (en) 2020-07-16 2023-03-28 Vmware, Inc. Facilitating distributed SNAT service
US11451413B2 (en) 2020-07-28 2022-09-20 Vmware, Inc. Method for advertising availability of distributed gateway service and machines at host computer
US11902050B2 (en) 2020-07-28 2024-02-13 VMware LLC Method for providing distributed gateway service at host computer

Also Published As

Publication number Publication date
EP4078935A1 (en) 2022-10-26
WO2022015394A1 (en) 2022-01-20

Similar Documents

Publication Publication Date Title
US10911360B2 (en) Anycast edge service gateways
US11616755B2 (en) Facilitating distributed SNAT service
US11902050B2 (en) Method for providing distributed gateway service at host computer
US11736394B2 (en) Address resolution using multiple designated instances of a logical router
CN112039768B (en) Intermediate logical interface in a virtual distributed routing environment
US10637800B2 (en) Replacement of logical network addresses with physical network addresses
US11451413B2 (en) Method for advertising availability of distributed gateway service and machines at host computer
CN109937401B (en) Live migration of load-balancing virtual machines via traffic bypass
CN116158064A (en) Facilitating distributed SNAT services
CN106134137B (en) Route advertisement for managed gateways
US11606294B2 (en) Host computer configured to facilitate distributed SNAT service
US11595345B2 (en) Assignment of unique physical network addresses for logical network addresses
JP5763081B2 (en) Method and apparatus for transparent cloud computing using virtualized network infrastructure
US20230124797A1 (en) Stateful services on stateless clustered edge
CN106576075B (en) Method and system for operating a logical network on a network virtualization infrastructure
US9413644B2 (en) Ingress ECMP in virtual distributed routing environment
CN107077579B (en) Method, apparatus, and medium for providing stateful services on stateless cluster edges
US9876714B2 (en) Stateful services on stateless clustered edge
US20220038379A1 (en) Route advertisement to support distributed gateway services architecture
WO2022026012A1 (en) Route advertisement to support distributed gateway services architecture
US9866473B2 (en) Stateful services on stateless clustered edge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: U.S.A.

Address after: California, USA

Applicant after: Weirui LLC

Address before: California, USA

Applicant before: VMWARE, Inc.

Country or region before: U.S.A.