WO2015032430A1 - Scheduling of virtual machines - Google Patents

Scheduling of virtual machines Download PDF

Info

Publication number
WO2015032430A1
WO2015032430A1 PCT/EP2013/068297 EP2013068297W WO2015032430A1 WO 2015032430 A1 WO2015032430 A1 WO 2015032430A1 EP 2013068297 W EP2013068297 W EP 2013068297W WO 2015032430 A1 WO2015032430 A1 WO 2015032430A1
Authority
WO
WIPO (PCT)
Prior art keywords
network traffic
virtual machine
host computer
host
peer
Prior art date
Application number
PCT/EP2013/068297
Other languages
French (fr)
Inventor
Victor Souza
Original Assignee
Telefonaktiebolaget L M Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget L M Ericsson (Publ) filed Critical Telefonaktiebolaget L M Ericsson (Publ)
Priority to PCT/EP2013/068297 priority Critical patent/WO2015032430A1/en
Publication of WO2015032430A1 publication Critical patent/WO2015032430A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0895Configuration of virtualised networks or elements, e.g. virtualised network function or OpenFlow elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0896Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities
    • H04L41/0897Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities by horizontal or vertical scaling of resources, or by migrating entities, e.g. virtual resources or entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/20Arrangements for monitoring or testing data switching networks the monitoring system or the monitored elements being virtualised, abstracted or software-defined entities, e.g. SDN or NFV
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/301Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is a virtual computing platform, e.g. logically partitioned systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3442Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for planning or managing the needed capacity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/502Proximity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/062Generation of reports related to network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • H04L43/0864Round trip delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • H04L43/0888Throughput
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring

Definitions

  • This invention relates to methods and apparatus for scheduling of virtual machines (VMs) in host computers (hosts) in a datacenter. Specifically, the invention may relate to, but is not limited to, scheduling of VMs in a cloud datacenter.
  • VMs may exist in any of a plurality of hosts (e.g., servers) that form a part of the datacenter and the location of one or more VMs within one or more hosts affects the speed of operations and the amount of network traffic within a datacenter.
  • hosts e.g., servers
  • VM scheduling directly impacts the efficiency of a datacenter and the performance of running VMs within the datacenter.
  • a scheduler may implement different scheduling policies (sometimes called scheduling objectives), depending on the objective of the user and cloud provider.
  • a packing policy minimizes the number of hosts in use.
  • a stripping policy maximizes the resources available to a VM by spreading the VMs across a plurality of hosts.
  • a load-aware policy maximizes the resources available to a VM by using hosts with greater available capacity.
  • Existing technology focuses on the allocation of computation resources and memory. When deploying payload applications that deal with the payload of data packets, networking resource is more important. This is valid for most telecommunications and network related products.
  • a VM placement algorithm is disclosed in nline traffic-aware virtual machine placement in data center networks", Dias & Costa.
  • Traffic between all VMs in a datacenter is used to create a traffic matrix.
  • the datacenter network is modelled as a graph using the traffic matrix to cluster VMs that exchange traffic.
  • cloudmirror application-aware bandwidth reservations in the cloud
  • Lee et al an abstraction for specifying bandwidth guarantees as a graph between components in a network.
  • a method of achieving more efficient allocation of resources within a datacenter is desirable. This is even truer for telecommunication applications deployed in a datacenter, as such applications are commonly payload applications. Examples of telecommunications applications deployed in datacenters include evolved packet gateways, routers, network proxies and middleboxes, amongst others.
  • a scheduling node for scheduling a virtual machine (1 10h) to a host computer (108a) in a datacenter.
  • the scheduling node comprises a receiver (304) configured to receive a network traffic profile for the virtual machine, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates.
  • the scheduling node comprises a scheduler (310) configured to determine a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile.
  • the closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines.
  • the scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor.
  • the scheduling node comprises a transmitter (302) configured to transmit instructions for the determined host computer to host the virtual machine.
  • the closeness factor for the one or more candidate host computers (108a, e-g) comprises the inverse of a sum of a number of hops required to transmit data from a candidate host computer to each of a plurality of peer virtual machines (1 10b-d).
  • the network traffic profile further comprises an amount of network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d), and wherein closeness factor for the one or more candidate host computers (108a, e-g) is weighted based on one or more of the amount of network traffic between the virtual machine (1 1 Oh) and the one or more peer virtual machines (1 10b-d); a priority associated with network traffic between the virtual machine (1 10h) and the one or more peer virtual machines (1 10b-d); and a round trip time for network traffic between the virtual machine (1 10h) and the one or more peer virtual machines (1 10b-d).
  • the scheduler (310) is configured to determine the closeness factor for each of a plurality of candidate host computers (108a, e-g), and wherein the selected host computer (108a) is selected from the plurality of candidate host computers based at least in part on the closeness factor.
  • each of the one or more candidate host computers (108a, e-g) comprises sufficient available resources to host the virtual machine (1 1 Oh).
  • one or more of the candidate host computers (108a, e-g) is a hardware accelerated host computer.
  • the scheduler (310) is configured to determine a demand of traffic throughput for a the virtual machine (1 10h), and to determine that the virtual machine is to be hosted in a hardware accelerated host computer if the determined demand for traffic throughput is above a threshold value.
  • a method for scheduling a virtual machine (1 1 Oh) to a host computer (108a) in a datacenter comprises receiving (700), by a receiver (304) of a scheduling network node (300), a network traffic profile for the virtual machine.
  • the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates.
  • the method comprises determining (710), by a scheduler (310), a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile.
  • the closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines.
  • the scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor.
  • the method comprises transmitting (712), by a transmitter (302), instructions for the determined host computer to host the virtual machine.
  • a non-transitory computer readable medium comprising computer readable code configured, when read and executed by a computer, to carry out the method described above.
  • a host computer for determining a network traffic profile of a virtual machine (1 1 Oh) in a datacenter.
  • the host computer comprises a monitor (210) configured to monitor network traffic sent to and/or received from the virtual machine.
  • the host computer comprises a profiler (212) configured to determine a network traffic profile for the virtual machine based on the monitored network traffic.
  • the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates.
  • the host computer comprises a transmitter (202) configured to transmit the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine.
  • the network traffic profile comprises an amount of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).
  • the network traffic profile comprises a rate of transmission and/or reception of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).
  • the network traffic profile comprises an indication of a round trip time for the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).
  • the network traffic profile comprises data indicating a burstiness of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).
  • the network traffic profile comprises data indicating a priority associated with monitored network traffic.
  • a method for determining a network traffic profile of a virtual machine (1 1 Oh) in a datacenter comprises monitoring (400), by a monitor (210), network traffic sent to and/or received from the virtual machine.
  • the method comprises determining (402), by a profiler (212), a network traffic profile for the virtual machine based on the monitored network traffic.
  • the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates.
  • the method comprises transmitting (404), by a transmitter (202), the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine.
  • a non-transitory computer readable medium (209) comprising computer readable code configured, when read and executed by a computer, to carry out the described above.
  • a system for profiling a virtual machine (1 1 Oh) and scheduling the virtual machine to a selected host computer (108a) in a datacenter comprises a host computer (108a-p) comprising a monitor (210) configured to monitor network traffic sent to and/or received from the virtual machine.
  • the host computer comprises a profiler (212) configured to determine a network traffic profile for the virtual machine based on the monitored network traffic.
  • the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates.
  • the host computer comprises a transmitter (202) configured to transmit the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine.
  • the scheduling node comprises a receiver (304) configured to receive a network traffic profile for the virtual machine.
  • the scheduling node comprises a scheduler (310) configured to determine a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile.
  • the closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines.
  • the scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor.
  • the scheduling node comprises a transmitter (302) configured to transmit instructions for the determined host computer to host the virtual machine.
  • a method for profiling a virtual machine (1 10h) and scheduling the virtual machine to a host computer (108a) in a datacenter comprises monitoring (800), by a monitor (210) of host computer (108a-p), network traffic sent to and/or received from the virtual machine.
  • the method comprises determining (802), by a profiler (212) of the host computer, a network traffic profile for the virtual machine based on the monitored network traffic.
  • the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates.
  • the method comprises transmitting (804), by a transmitter (202) of the host computer, the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine.
  • the method comprises receiving (806), by a receiver (304) of the scheduling node, a network traffic profile for a virtual machine.
  • the method comprises determining (818), by a scheduler (310) of the scheduling node, a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile.
  • the closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines.
  • the scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor.
  • the method comprises transmitting (820), by a transmitter (302) of the scheduling node, instructions to the determined host computer to host the virtual machine.
  • Figure 1 is a schematic of a datacenter topology
  • Figure 2 is a schematic of a host computer
  • Figure 3 is a schematic of a scheduling node
  • Figure 4 is a flow diagram of a method for determining a network traffic profile of a virtual machine (1 1 Oh) in a datacenter
  • Figure 5 is a schematic of a datacenter topology
  • Figure 6 is a schematic of a datacenter topology
  • Figure 7 is a flow diagram of a method for scheduling a virtual machine (1 10h) to a host computer (108a) in a datacenter;
  • Figure 8 is a flow diagram of a method for profiling a virtual machine (1 1 Oh) and scheduling the virtual machine to a host computer (108a) in a datacenter.
  • Exemplary methods and apparatus may determine network capacity and location for a plurality of candidate hosts and schedule a VM to one of the plurality of candidate hosts based on the determined network capacity and/or the location. This may provide advantages of reduced network traffic within a datacenter.
  • SR-IOV single-root input/output virtualization
  • PCI peripheral component interconnect
  • PCIe peripheral component interconnect express
  • An SR-IOV network card possesses independent virtual functions (VFs), which behave just like an independent network interface controller (NIC) for a VM. This allows VMs to have a dedicated virtual NIC in hardware, with a higher throughput of network traffic than through a virtual switch (in software).
  • VFs independent virtual functions
  • NIC network interface controller
  • VM scheduling is based on the allocation of computation resources and memory
  • VM network capacity for that application
  • Applications may then underperform, time-out and/or crash, depending on the nature of the protocols involved.
  • scheduling should be adaptable to the networking needs and communication patterns of its applications.
  • Methods and apparatus disclosed herein propose heuristic scheduling of VMs in hosts of a datacenter. Such heuristics may be based on a determined network traffic profile of a VM.
  • the network traffic profile may indicate the networking needs of running one or more VMs in one or more hosts in a datacenter, e.g., measured in terms of data packets transmitted/received, possibly over a given time period.
  • the network traffic profile may be determined in an automated fashion by one or more profilers in one or more hosts by, for example, a hypervisor application.
  • the one or more profilers configured to determine the network traffic profile may advantageously be located in one or more of a plurality of hosts.
  • a profiler may be located in every host hypervisor where a VM may need to be profiled.
  • a network traffic profile of a VM may be transmitted to a scheduling node of the datacenter comprising a scheduler.
  • the scheduling node possesses a topology map of the underlying network infrastructure and is configured to determine a host computer that should host the VM based on the network traffic profile. It is noted that in some exemplary methods and apparatus, the host configured to monitor the VM and determine the network traffic profile may be collocated with the scheduling node and may be the same node of the datacenter.
  • Hosts with enhanced capabilities may be present in a datacenter.
  • An example of such hosts includes those with hardware accelerated network virtualization (e.g., SR-IOV or Virtual Machine Device Queues (VMDq)).
  • VMs with stringent networking requirements as determined by monitoring network traffic associated with a VM for determining a network traffic profile, may be allocated hosts with better networking features. This may allow for better execution of payload applications.
  • Exemplary methods and apparatus disclosed herein may group VMs that communicate a lot with each other. That is, a VM will communicate commonly with a plurality of peer VMs and the VM may be located in close proximity to such peer VMs. By doing so, the amount of network traffic in a datacenter may be mitigated and unnecessary overloading of the datacenter links may be avoided.
  • Methods and apparatus disclosed herein may use a notion of "closeness" to determine a closeness factor, as defined herein, to determine whether a host is in close proximity to peer VMs (or host computers hosting peer VMs). The closeness factor may be weighted by the amount of traffic that the monitored VM sends to and receives from each peer VM.
  • Exemplary methods and apparatus may comprise one or more of the following:
  • Figure 1 shows a typical structure of a datacenter network comprising a plurality of nodes 100, 102a-b, 104a-d, 106a-h, 108a-p.
  • a datacenter border gateway 100 may control access for the datacenter network to the Internet.
  • the datacenter border gateway 100 may therefore be a router.
  • the datacenter border gateway is in electrical communication with two layer 2 (of the open systems interconnection (OSI) model) switches 102a-b.
  • the layer 2 switches 102a-b are each in electrical communication with four aggregation switches 104a-d.
  • the aggregation switches 104a-d are each in electrical communication with two edge (or access) switches 106a-h.
  • OSI open systems interconnection
  • the edge switches 106a-h are each in electrical communication with two host computers 108a-p, which are configured to host one or more VMs 1 10a-p.
  • host computers 108a-p which are configured to host one or more VMs 1 10a-p.
  • hosts 108a-p and VMs 1 10a-p are individually referenced, but these features of Figure 2 run consecutively from left to right with "a" on the left hand side.
  • This typical arrangement of a datacenter network is also known as a tree structure.
  • Other arrangements include VL2, fat-tree, and B-Cube. It is noted that even though the solution herein presented can be applied to those other types of arrangements and more.
  • the topological map of the datacenter network is stored in the scheduler.
  • each of the nodes 100, 102a-b, 104a-d, 106a-h, 108a-p of the datacenter topography shown in Figure 1 are for illustrative purposes only. Each node may be in electrical connection with more or fewer nodes than shown in Figure 1 . Further, it is noted that the structure of Figure 1 is a hierarchical structure, in which hosts 108a-p may be considered the leaves of the network. Hosts 108a-p can have different network cards, features and capacity. Hosts 108a-p may have support for hardware-accelerated network virtualization.
  • FIG. 2 shows a schematic representation of a host computer 108.
  • the methods and apparatus disclosed herein permit the host computer 108 to monitor network traffic associated with a VM to determine a network traffic profile for the VM.
  • the host 108 comprises a transmitter 202 and a receiver 204, which form part of a communication unit 205.
  • the transmitter 202 and receiver 204 are in electrical communication with other nodes and/or functions in a datacenter and are configured to transmit and receive data accordingly.
  • the term "network traffic profile" encompasses data relating to the type (e.g., TCP or UDP), rate (e.g. packets/sec and/or bytes/sec), burstiness, priority settings, round trip time (RTT), source and/or destination of network traffic sent from and received by a VM.
  • the network traffic profile may comprise addresses of peer VMs 1 10 with which a VM communicates, an amount of network traffic sent from and/or received by the VM and/or an indication of the rate of data sent from and/or received by the VM (e.g. in packets/sec and bytes/sec).
  • the host 108 further comprises a memory 206 and a processor 208.
  • the memory 206 may comprise non-volatile memory and/or volatile memory.
  • the memory 206 may have a computer program 207 stored therein.
  • the computer program 207 may be configured to undertake the methods disclosed herein.
  • the computer program 207 may be loaded in the memory 206 from a non-transitory computer readable medium 209, on which the computer program is stored.
  • the processor 208 is configured to undertake the functions of a monitor 210, and a profiler 212.
  • Each of the transmitter 202, receiver 204, communications unit 205, memory 206, processor 208, monitor 210 and profiler 212 is in electrical communication with the other features 202, 204, 205 206, 208, 210, 212 of the host 108.
  • the host 108 can be implemented as a combination of computer hardware and software.
  • the monitor 210 and profiler 212 may be implemented as software configured to run on the processor 208.
  • the memory 206 stores the various programs/executable files that are executed in a processor 208, and also provides a storage unit for any required data.
  • the programs/executable files stored in the memory 206, and executed in the processor 208, can include the monitor 210 and the profiler 212, but are not limited to such.
  • FIG 3 shows a schematic representation of a scheduling node 300 of a datacenter network configured to schedule a VM to a host 108 in accordance with the methods and apparatus disclosed herein.
  • the scheduling node 300 may be configured to receive a network traffic profile and schedule a VM to a host 108, based on that received profile.
  • the scheduling node 300 may form part of any of the nodes 100, 102, 104, 1066, 108 of the datacenter topology shown in Figure 1.
  • the scheduling node 300 may be a separate node that is independent from the topology shown in Figure 1 , typically part of a datacenter Cloud Management System.
  • the scheduling node 300 may be the same node as the host computer 108 configured to monitor the VM and determine the network traffic profile, as set out below.
  • the transmission of the network traffic profile to the scheduling node 300 may be an internal transmission within a single node.
  • the scheduling node 300 may be deployed in a virtual machine that may be physically deployed in any of the hosts.
  • the scheduling node 300 comprises a transmitter 302 and a receiver 304, which form part of a communication unit 305.
  • the transmitter 302 and receiver 304 are in electrical communication with other nodes and/or functions in a datacenter and are configured to transmit and receive data accordingly.
  • the scheduling node 300 further comprises a memory 306 and a processor 308.
  • the memory 306 may comprise non-volatile memory and/or volatile memory.
  • the 306 may have a computer program 307 stored therein.
  • the computer program 307 may be configured to undertake the methods disclosed herein.
  • the processor 308 is configured to undertake the functions of a scheduler 310.
  • Each of the transmitter 302, receiver 304, communications unit 305, memory 306, processor 308 and scheduler 310 is in electrical communication with the other features 302, 304, 305, 306, 308, 310 of the scheduling node 300.
  • the scheduling node 300 can be implemented as a combination of computer hardware and software.
  • the scheduler 310 may be implemented as software configured to run on the processor 308.
  • the memory 306 stores the various programs/executable files that are executed in a processor 308, and also provides a storage unit for any required data.
  • the programs/executable files stored in the memory 306, and executed in the processor 308, can include the scheduler 310, but are not limited to such.
  • the methods and apparatus disclosed herein are capable of profiling the networking needs of one or more VMs 1 10a-p.
  • the objective is to perform characterisation of the network traffic pattern sent and/or received by a VM 1 10a-p. This is done through the use of enhanced counters that may be added to a hypervisor function running on a host 108a-p.
  • the monitor 210 and profiler 212 may form part of a hypervisor.
  • Existing counters typically monitor a total amount of traffic in/out of a network card.
  • a VM 1 1 Oh is considered to be a monitored VM and the VMs 1 10b-d are considered to be peer VMs with which the monitored VM communicates.
  • the monitor 210 may comprise an enhanced counter configured to log the source and/or destination of network traffic received and/or sent by the monitored VM 1 10h.
  • the monitor 210 may also comprise enhanced counters configured to determine an amount of network traffic sent to and/or received from each of a plurality of peer VMs 1 10b-d, which may be averaged over a period of time (packets/sec and bytes/sec).
  • the profiler 212 may be configured to output a list for each one of the VMs 1 10b-d with which a monitored VM 1 1 Oh communicates. This list may contain the IP addresses of the peer VMs 1 10b-d and the average amount of network traffic (packets/sec and bytes/sec) sent to and/or received from each peer VM 1 10b-d. The profiler 212 may also be configured to determine the rate and/or burstiness of network traffic sent to and/or received from each peer VM 1 10b-d and a round trip time (RTT) for network traffic sent to and/or received from each peer VM 1 10b-d. Other networking related parameters may also be monitored and determined.
  • RTT round trip time
  • the VMs 1 10a-h may be profiled at specific times while the datacenter is in operation and information fed to the scheduler 310 of the scheduling node 300.
  • the profiler 212 may apply certain techniques to minimize the amount of data sent to the scheduler 310. Examples include, filtering out peer VMs 1 10b-d with a traffic volume below a certain threshold and reporting to the scheduler 312 only when substantial changes in a network traffic profile are determined.
  • Figure 4 shows a method for determining a network traffic profile for a VM 1 1 Oh.
  • a monitor 210 of a host 108a-p monitors 400 network traffic associated with the VM 1 1 Oh.
  • the monitor may therefore monitor one or more of the destination address, source address, rate and amount of the data sent to and/or received from the monitored VM 1 10h. This may be done by intercepting such traffic and receiving it at the receiver 205 of the host 108a-p. Therefore, it is advantageous for the monitor 210 configured to monitor the traffic of the VM 1 10h to be in the host 108h in which the VM 1 10h resides.
  • the profiler 212 of the host 108a-p determines 402 a network traffic profile for the monitored VM 1 10h based on the monitored network traffic.
  • the network traffic profile represents information indicating the networking needs of the monitored VM 1 10h.
  • the profile indicates the peer VMs 1 10b-d with which the monitored VM communicates and the amount of communication with each, which may be used to determine the best location for the monitored VM.
  • the network traffic profile may indicate the rate (e.g. in packets/sec and/or bytes/sec), burstiness, priority settings, and RTT of the network traffic, all of which may be used to determine the hardware requirements of a host 108 in which the VM 1 10h is to reside.
  • the transmitter 202 of the host 108 transmits 404 the network traffic profile to the scheduling node 300.
  • the network traffic profile information may be used to schedule the monitored VM 1 10h to reside in an optimal location (host 108a-p).
  • the optimal location may take into account the network resources needed by the monitored VM 1 1 Oh and determinable from the network traffic profile.
  • the optimal location may also take account of the available hosts, whether the available hosts are hardware accelerated and the locations of the hosts 108b-d in which the peer VMs 1 10b-d reside. It is noted that the term "location" with reference to hosts 108a-h and/or VMs 1 10a-h may refer to location within the topology of the datacenter network and may not be associated with a physical location.
  • Exemplary methods and apparatus may be arranged to determine a host 108a-p in which the monitored VM 100h may reside, based on amount of packets/sec sent by the VM 100h and the capacity of a host 108a-p to provide the required amount of traffic.
  • Hosts with hardware-accelerated virtualization are known to handle higher amounts of traffic (e.g. packets/sec) and smaller packets.
  • Exemplary methods and apparatus may be arranged to determine a host 108a-p in which the monitored VM 1 10h may reside, based on the host's 108a-p proximity to one or more of the peer VMs 1 1 Ob-d.
  • Figure 5 illustrates a scenario where a VM 1 1 Oh resides on a host 108h.
  • a network traffic profile determined for VM 1 1 Oh shows that it communicates with the peer VMs 1 1 Ob-d. This is shown by the arrow 500.
  • the inventors have therefore defined an algorithm for determining the closeness of hosts 108a-p to each other. Obviously, more than one VM may be scheduled in a host (not shown in Figure 5). In its broadest sense, the determination of closeness comprises the inverse of the number of hops (links crossed) when travelling from a first host 108a-p to a second host 108a-p. Therefore, a closeness factor may be defined as:
  • hops is the number of links crossed. For the avoidance of doubt, it is noted that transmitting data from the host 108a to the edge switch 106a involves one hop and transmitting data from the host 108a to the host 108b involves two hops (one hop to the edge switch 106a and a second hop to the host 108b).
  • closeness factor means that there is no requirement to prepare a matrix relating to all hosts in a datacenter.
  • the closeness factor may be a unique value attributable to a given host, based on the networking needs of a given VM. Further, the host in which a VM resides may be configured to monitor that VM, rather than having to monitor all VMs in the datacenter. This allows a particular VM to be scheduled efficiently with a reduced computational burden.
  • a closeness factor may be defined as:
  • i is the index of the peer VM 1 10b-d.
  • This allows the closeness of a host 108a-p to a plurality of hosts 108b-d hosting the peer VMs 1 10b-d to be determined.
  • the closeness factor of host 108h to the hosts 108b-d is the inverse of the sum of the number of hops from host 108h to the host 108b, the number of hops from host 108h to the host 108c and the number of hops from host 108h to the host 108d.
  • a further refinement of the closeness factor may be implemented to allow it to be related to the amount of traffic sent to and/or received from the peer VMs 1 10b-d. Accordingly, the volume of traffic should influence the determination of the host 108a-p in which the monitored VM 1 10h should be placed. Therefore, the closeness factor may be weighted by the amount of traffic sent to and/or received from a peer VM 1 10b- d:
  • i is the index of the peer VM 1 10b-d. Therefore, when considering the closeness of the host computers 108a and 108e-p that are not hosting peer VMs 1 10b-d to the host computers 108b-d that are hosting the peer VMs 1 10b-d, it can be seen that host 108a is the closest.
  • Figure 6 shows a datacenter network topology after the monitored VM 1 1 Oh has been scheduled to be hosted by the host 108a. It is clear from Figure 6 that fewer hops are required for the VM 1 10h to transmit data to and receive data from the peer VMs 1 10b- d. No traffic is required to be sent over the links between the edge switch 106c and the aggregation switch 104b and no traffic is sent over the links between the aggregation switches 104a-b and the layer 2 switch 102a.
  • candidate hosts are determined initially.
  • the candidate hosts comprise hosts with enough computational resource and memory to run VM 1 1 Oh. Hosts that have less network capacity than is needed to run VM 1 1 Oh are not part of the list of candidate hosts, to determine a reduced list of candidate hosts.
  • the amount of network capacity needed to run VM 1 1 Oh may be determined from the network traffic profile for the VM 1 10h.
  • the closeness factor is determined, as set out above.
  • a host 108a-p is determined in which the VM 1 10h is to be hosted, based on the determined closeness factors.
  • the host 108a-p with highest closeness factor is selected, thus grouping together VMs that communicate a lot with each other.
  • Figure 7 shows a flow chart for a method for scheduling a VM to a host computer.
  • the receiver 304 of the scheduling node 300 receives 700 the network traffic profile for the VM 1 10h.
  • a set of candidate hosts is selected 701 by the scheduler 310 of the scheduling node 300. This may be done using a number of hosts, say ten, and the list of candidates may be determined based on the likely closeness of hosts to the peer VMs 1 10b-d. For example, if a host shares an edge switch 106a-h, an aggregation switch 104a-d or a layer 2 switch 102a-b with one or more of the peer VMs 1 10b-d.
  • the candidate list may be determined to include only hardware accelerated hosts.
  • the candidate hosts are considered to be 108a, e-g.
  • the scheduler 310 determines 702 whether the first host 108a of the candidate hosts 108a, e-g has sufficient network capacity to host the VM 1 1 Oh. This may be done based on the received network traffic profile. If no, the candidate host 108a is removed 704 from the list of candidate hosts 108a, e-g. The scheduler 310 then determines 706 whether any more candidate hosts remain to be assessed for network capacity. If yes, steps 702-706 are repeated for the next host 108e. If no, then all candidate hosts have been assessed for network capacity and a reduced list of candidate hosts has been determined. For illustrative purposes, the reduced list is considered to be the same as the original list. That is, it is considered that all candidate hosts 108a, e-g have sufficient network capacity.
  • the scheduler 310 determines 708 the closeness of the candidate hosts 108a, e-g on the reduced list, as set out above.
  • the scheduler determines 710 the optimal host as the host on the reduced list having the highest closeness factor.
  • the optimal host is considered to be 108a.
  • the transmitter 302 transmits 712 instructions for the determined host 108a to host the monitored VM 1 1 Oh as part of a VM migration process.
  • Figure 8 shows a flow diagram for the profiling and scheduling of a VM in a host computer of a datacenter.
  • a monitor 210 of a host 108a-p monitors 800 network traffic associated with the VM 1 10h, as set out above.
  • the profiler 212 of the host 108a-p determines 802 a network traffic profile for the monitored VM 1 1 Oh based on the monitored network traffic, as set out above.
  • the transmitter 202 of the host 108 transmits 804 the network traffic profile to the scheduling node 300.
  • the receiver 304 of the scheduling node 300 receives 806 the network traffic profile for the VM 1 1 Oh.
  • a set of candidate hosts is selected 808 by the scheduler 310 of the scheduling node 300, as set out above.
  • the scheduler 310 determines 810 whether the first host 108a of the candidate hosts 108a, e-g has sufficient network capacity to host the VM 1 1 Oh, as set out above. If no, the candidate host 108a is removed 812 from the list of candidate hosts 108a, e-g.
  • the scheduler 310 determines 814 whether any more candidate hosts remain to be assessed for network capacity. If yes, steps 810-814 are repeated for the next host 108e.
  • the scheduler 310 determines 816 the closeness of the candidate hosts 108a, e-g on the reduced list, as set out above.
  • the scheduler determines 818 the optimal host as the host on the reduced list having the highest closeness factor. As above, for illustrative purposes, the optimal host is considered to be 108a.
  • the transmitter 302 transmits 820 instructions for the determined host 108a to host the monitored VM 1 10h.
  • VMs 1 10a-p may be classified into two types.
  • Signalling VMs these are virtual machines hosting signalling applications, where the volume of traffic sent/received is low
  • Payload VMs these are virtual machines hosting payload applications with high traffic volume and high throughput
  • Type 1 VMs have typically low traffic volume and low throughput and may communicate with a large set of peer VMs, depending on the application type.
  • Type 2 VMs have typically high traffic volume and high throughput and typically communicate with a large set of peer VMs, either intra or inter-datacentre.
  • the scheduler 310 may be configured to treat VMs of type 1 and type 2 differently.
  • Type 1 VMs may be configured to be scheduled to any of the closest available host 108a-p having sufficient network capacity.
  • Type 2 VMs may be configured to be scheduled to the closest available hardware accelerated host 108a-p having sufficient network capacity. Therefore, for Type 2 VMs, the scheduler 310 may be configured to determine a candidate host list comprising only hardware accelerated VMs.
  • the scheduler may be configured to use further profiling information provided by the profiler. For example, the scheduler may weight the closeness factor based on traffic priority settings thus placing the monitored VM closer to a peer VM whose traffic has higher priority settings. Other methods and apparatus include weighting the closeness factor with the RTT, thus improving perceived delay for the worst RTT results.
  • Exemplary methods and apparatus may determine a network traffic profile for one or more VMs at intervals while the datacenter is in operation. Therefore, the datacenter may be configured to reschedule VMs on the fly according to whether the network traffic profile has changed.
  • Exemplary methods and apparatus disclosed herein propose a scheduler that is network aware through the use of heuristics, taking into account the networking needs of each VM.
  • Existing solutions focus on compute, memory and storage requirements only. Instead of demanding the owner of the VM to determine a priori the network needs, methods and apparatus disclosed herein may automatically detect the network traffic profiles of VMs to determine the optimal hosts for VMs.
  • Methods and apparatus disclosed herein may adapt to the communication patterns of VMs by dynamically rescheduling VMs to hosts. This may optimise VM placement according to the peer VMs it communicates with.
  • methods and apparatus disclosed herein may create conglomerates of VMs that talk intensively to each other.
  • a computer program may be configured to provide any of the above described methods.
  • the computer program may be provided on a computer readable medium.
  • the computer program may be a computer program product.
  • the product may comprise a non-transitory computer usable storage medium.
  • the computer program product may have computer-readable program code embodied in the medium configured to perform the method.
  • the computer program product may be configured to cause at least one processor to perform some or all of the method.
  • These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).
  • Computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer- readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks.
  • a tangible, non-transitory computer-readable medium may include an electronic, magnetic, optical, electromagnetic, or semiconductor data storage system, apparatus, or device.
  • a portable computer diskette a random access memory (RAM) circuit, a read-only memory (ROM) circuit, an erasable programmable read-only memory (EPROM or Flash memory) circuit, a portable compact disc read-only memory (CD- ROM), and a portable digital video disc read-only memory (DVD/Blu-ray).
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • CD- ROM compact disc read-only memory
  • DVD/Blu-ray portable digital video disc read-only memory
  • the computer program instructions may also be loaded onto a computer and/or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer and/or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
  • the invention may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.) that runs on a processor, which may collectively be referred to as "circuitry," "a module” or variants thereof.

Abstract

Methods and apparatus for profiling a virtual machine (110h) and/or scheduling the virtual machine to a selected host computer (108a) in a datacenter. A system comprises a host computer (108a-p) comprising a monitor (210) configured to monitor network traffic sent to and/or received from the virtual machine. The host computer comprises a profiler (212) configured to determine a network traffic profile for the virtual machine based on the monitored network traffic. The network traffic profile comprises an address of one or more peer virtual machines (110b-d) with which the virtual machine communicates. The host computer comprises a transmitter (202) configured to transmit the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine. The scheduling node comprises a receiver (304) configured to receive a network traffic profile for the virtual machine. The scheduling node comprises a scheduler (310) configured to determine a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile. The closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines. The scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor. The scheduling node comprises a transmitter (302) configured to transmit instructions for the determined host computer to host the virtual machine.

Description

SCHEDULING OF VIRTUAL MACHINES
Technical field This invention relates to methods and apparatus for scheduling of virtual machines (VMs) in host computers (hosts) in a datacenter. Specifically, the invention may relate to, but is not limited to, scheduling of VMs in a cloud datacenter.
Background
Datacenters often make use of virtualisation technologies, which may define a plurality of VMs capable of undertaking particular tasks within the datacenter. VMs may exist in any of a plurality of hosts (e.g., servers) that form a part of the datacenter and the location of one or more VMs within one or more hosts affects the speed of operations and the amount of network traffic within a datacenter.
Many cloud datacenter solutions comprise a scheduler for determining where a VM should be executed (i.e., which host computer). Those schedulers range from very simple schedulers that, for example, randomly pick a host to very advanced schedulers that, for example hold multiple sub-schedulers. VM scheduling directly impacts the efficiency of a datacenter and the performance of running VMs within the datacenter.
In known solutions, for example, a scheduler may implement different scheduling policies (sometimes called scheduling objectives), depending on the objective of the user and cloud provider. A packing policy minimizes the number of hosts in use. A stripping policy maximizes the resources available to a VM by spreading the VMs across a plurality of hosts. A load-aware policy maximizes the resources available to a VM by using hosts with greater available capacity. Existing technology focuses on the allocation of computation resources and memory. When deploying payload applications that deal with the payload of data packets, networking resource is more important. This is valid for most telecommunications and network related products. A VM placement algorithm is disclosed in nline traffic-aware virtual machine placement in data center networks", Dias & Costa. Traffic between all VMs in a datacenter is used to create a traffic matrix. The datacenter network is modelled as a graph using the traffic matrix to cluster VMs that exchange traffic. In "cloudmirror: application-aware bandwidth reservations in the cloud", Lee et al, an abstraction for specifying bandwidth guarantees as a graph between components in a network.
A method of achieving more efficient allocation of resources within a datacenter is desirable. This is even truer for telecommunication applications deployed in a datacenter, as such applications are commonly payload applications. Examples of telecommunications applications deployed in datacenters include evolved packet gateways, routers, network proxies and middleboxes, amongst others.
It is advantageous to determine the optimal host machine in which a given VM may be located.
Summary
According to the invention in a first aspect, there is provided a scheduling node (300) for scheduling a virtual machine (1 10h) to a host computer (108a) in a datacenter. The scheduling node comprises a receiver (304) configured to receive a network traffic profile for the virtual machine, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates. The scheduling node comprises a scheduler (310) configured to determine a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile. The closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines. The scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor. The scheduling node comprises a transmitter (302) configured to transmit instructions for the determined host computer to host the virtual machine.
Optionally, the closeness factor for the one or more candidate host computers (108a, e-g) comprises the inverse of a sum of a number of hops required to transmit data from a candidate host computer to each of a plurality of peer virtual machines (1 10b-d). Optionally, the network traffic profile further comprises an amount of network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d), and wherein closeness factor for the one or more candidate host computers (108a, e-g) is weighted based on one or more of the amount of network traffic between the virtual machine (1 1 Oh) and the one or more peer virtual machines (1 10b-d); a priority associated with network traffic between the virtual machine (1 10h) and the one or more peer virtual machines (1 10b-d); and a round trip time for network traffic between the virtual machine (1 10h) and the one or more peer virtual machines (1 10b-d). Optionally, the scheduler (310) is configured to determine the closeness factor for each of a plurality of candidate host computers (108a, e-g), and wherein the selected host computer (108a) is selected from the plurality of candidate host computers based at least in part on the closeness factor. Optionally, each of the one or more candidate host computers (108a, e-g) comprises sufficient available resources to host the virtual machine (1 1 Oh).
Optionally, one or more of the candidate host computers (108a, e-g) is a hardware accelerated host computer.
Optionally, the scheduler (310) is configured to determine a demand of traffic throughput for a the virtual machine (1 10h), and to determine that the virtual machine is to be hosted in a hardware accelerated host computer if the determined demand for traffic throughput is above a threshold value.
According to the invention in a second aspect, there is provided a method for scheduling a virtual machine (1 1 Oh) to a host computer (108a) in a datacenter. The method comprises receiving (700), by a receiver (304) of a scheduling network node (300), a network traffic profile for the virtual machine. The network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates. The method comprises determining (710), by a scheduler (310), a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile. The closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines. The scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor. The method comprises transmitting (712), by a transmitter (302), instructions for the determined host computer to host the virtual machine.
According to the invention in a third aspect, there is provided a non-transitory computer readable medium (309) comprising computer readable code configured, when read and executed by a computer, to carry out the method described above.
According to the invention in a fourth aspect, there is provided a host computer (108a- p) for determining a network traffic profile of a virtual machine (1 1 Oh) in a datacenter. The host computer comprises a monitor (210) configured to monitor network traffic sent to and/or received from the virtual machine. The host computer comprises a profiler (212) configured to determine a network traffic profile for the virtual machine based on the monitored network traffic. The network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates. The host computer comprises a transmitter (202) configured to transmit the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine.
Optionally, the network traffic profile comprises an amount of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).
Optionally, the network traffic profile comprises a rate of transmission and/or reception of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).
Optionally, the network traffic profile comprises an indication of a round trip time for the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d). Optionally, the network traffic profile comprises data indicating a burstiness of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).
Optionally, the network traffic profile comprises data indicating a priority associated with monitored network traffic. According to the invention in a fifth aspect, there is provided a method for determining a network traffic profile of a virtual machine (1 1 Oh) in a datacenter. The method comprises monitoring (400), by a monitor (210), network traffic sent to and/or received from the virtual machine. The method comprises determining (402), by a profiler (212), a network traffic profile for the virtual machine based on the monitored network traffic. The network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates. The method comprises transmitting (404), by a transmitter (202), the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine.
According to the invention in a sixth aspect, there is provided a non-transitory computer readable medium (209) comprising computer readable code configured, when read and executed by a computer, to carry out the described above. According to the invention in a seventh aspect, there is provided a system for profiling a virtual machine (1 1 Oh) and scheduling the virtual machine to a selected host computer (108a) in a datacenter. The system comprises a host computer (108a-p) comprising a monitor (210) configured to monitor network traffic sent to and/or received from the virtual machine. The host computer comprises a profiler (212) configured to determine a network traffic profile for the virtual machine based on the monitored network traffic. The network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates. The host computer comprises a transmitter (202) configured to transmit the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine. The scheduling node comprises a receiver (304) configured to receive a network traffic profile for the virtual machine. The scheduling node comprises a scheduler (310) configured to determine a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile. The closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines. The scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor. The scheduling node comprises a transmitter (302) configured to transmit instructions for the determined host computer to host the virtual machine. According to the invention in eighth aspect, there is provided a method for profiling a virtual machine (1 10h) and scheduling the virtual machine to a host computer (108a) in a datacenter. The method comprises monitoring (800), by a monitor (210) of host computer (108a-p), network traffic sent to and/or received from the virtual machine. The method comprises determining (802), by a profiler (212) of the host computer, a network traffic profile for the virtual machine based on the monitored network traffic. The network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates. The method comprises transmitting (804), by a transmitter (202) of the host computer, the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine. The method comprises receiving (806), by a receiver (304) of the scheduling node, a network traffic profile for a virtual machine. The method comprises determining (818), by a scheduler (310) of the scheduling node, a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile. The closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines. The scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor. The method comprises transmitting (820), by a transmitter (302) of the scheduling node, instructions to the determined host computer to host the virtual machine. According to the invention in a ninth aspect, there is provided a non-transitory computer readable medium comprising computer readable code configured, when read and executed by a computer, to carry out the method described above.
Brief description of the drawings
Exemplary methods and apparatus are described herein with reference to the accompanying drawings, in which:
Figure 1 is a schematic of a datacenter topology;
Figure 2 is a schematic of a host computer;
Figure 3 is a schematic of a scheduling node; Figure 4 is a flow diagram of a method for determining a network traffic profile of a virtual machine (1 1 Oh) in a datacenter; Figure 5 is a schematic of a datacenter topology; Figure 6 is a schematic of a datacenter topology;
Figure 7 is a flow diagram of a method for scheduling a virtual machine (1 10h) to a host computer (108a) in a datacenter; and
Figure 8 is a flow diagram of a method for profiling a virtual machine (1 1 Oh) and scheduling the virtual machine to a host computer (108a) in a datacenter.
Description
Generally, disclosed herein are methods and apparatus for profiling and/or scheduling virtual machines within a database of host computers (e.g. servers). Exemplary methods and apparatus may determine network capacity and location for a plurality of candidate hosts and schedule a VM to one of the plurality of candidate hosts based on the determined network capacity and/or the location. This may provide advantages of reduced network traffic within a datacenter.
Methods and apparatus disclosed herein leverage recent advancements in hardware accelerated virtualization. An existing technology for hardware-accelerated virtualized networking is single-root input/output virtualization (SR-IOV). SR-IOV provides a method of virtualizing a peripheral component interconnect (PCI) card and/or peripheral component interconnect express (PCIe) card, which means that a single PCI card can show up as many virtual PCI cards. An SR-IOV network card possesses independent virtual functions (VFs), which behave just like an independent network interface controller (NIC) for a VM. This allows VMs to have a dedicated virtual NIC in hardware, with a higher throughput of network traffic than through a virtual switch (in software).
The inventors have appreciated that if payload applications are deployed in datacenters in which VM scheduling is based on the allocation of computation resources and memory, there is no way to demand a given network capacity for that application (VM). Applications may then underperform, time-out and/or crash, depending on the nature of the protocols involved. Moreover, the inventors have appreciated that once applications are running in a datacenter, scheduling should be adaptable to the networking needs and communication patterns of its applications.
Methods and apparatus disclosed herein propose heuristic scheduling of VMs in hosts of a datacenter. Such heuristics may be based on a determined network traffic profile of a VM. The network traffic profile may indicate the networking needs of running one or more VMs in one or more hosts in a datacenter, e.g., measured in terms of data packets transmitted/received, possibly over a given time period. The network traffic profile may be determined in an automated fashion by one or more profilers in one or more hosts by, for example, a hypervisor application. The one or more profilers configured to determine the network traffic profile may advantageously be located in one or more of a plurality of hosts. In specific exemplary methods and apparatus, a profiler may be located in every host hypervisor where a VM may need to be profiled. A network traffic profile of a VM may be transmitted to a scheduling node of the datacenter comprising a scheduler. The scheduling node possesses a topology map of the underlying network infrastructure and is configured to determine a host computer that should host the VM based on the network traffic profile. It is noted that in some exemplary methods and apparatus, the host configured to monitor the VM and determine the network traffic profile may be collocated with the scheduling node and may be the same node of the datacenter.
Hosts with enhanced capabilities may be present in a datacenter. An example of such hosts includes those with hardware accelerated network virtualization (e.g., SR-IOV or Virtual Machine Device Queues (VMDq)). VMs with stringent networking requirements, as determined by monitoring network traffic associated with a VM for determining a network traffic profile, may be allocated hosts with better networking features. This may allow for better execution of payload applications.
Exemplary methods and apparatus disclosed herein may group VMs that communicate a lot with each other. That is, a VM will communicate commonly with a plurality of peer VMs and the VM may be located in close proximity to such peer VMs. By doing so, the amount of network traffic in a datacenter may be mitigated and unnecessary overloading of the datacenter links may be avoided. Methods and apparatus disclosed herein may use a notion of "closeness" to determine a closeness factor, as defined herein, to determine whether a host is in close proximity to peer VMs (or host computers hosting peer VMs). The closeness factor may be weighted by the amount of traffic that the monitored VM sends to and receives from each peer VM. Exemplary methods and apparatus may comprise one or more of the following:
• Determining a network traffic profile of a VM running on a first host by monitoring network traffic by a hypervisor function
• Transmitting the network traffic profile information to a scheduling node comprising a network aware scheduler
• Determining whether there is a better host in which to run the VM, e.g., a hardware accelerated host
• Scheduling the VM close to other VMs it communicates with. These are from here on called peer VMs.
Figure 1 shows a typical structure of a datacenter network comprising a plurality of nodes 100, 102a-b, 104a-d, 106a-h, 108a-p. A datacenter border gateway 100 may control access for the datacenter network to the Internet. The datacenter border gateway 100 may therefore be a router. The datacenter border gateway is in electrical communication with two layer 2 (of the open systems interconnection (OSI) model) switches 102a-b. The layer 2 switches 102a-b are each in electrical communication with four aggregation switches 104a-d. The aggregation switches 104a-d are each in electrical communication with two edge (or access) switches 106a-h. The edge switches 106a-h are each in electrical communication with two host computers 108a-p, which are configured to host one or more VMs 1 10a-p. For clarity, not all of the edge switches 106a-h, hosts 108a-p and VMs 1 10a-p are individually referenced, but these features of Figure 2 run consecutively from left to right with "a" on the left hand side.
This typical arrangement of a datacenter network is also known as a tree structure. Other arrangements include VL2, fat-tree, and B-Cube. It is noted that even though the solution herein presented can be applied to those other types of arrangements and more. The topological map of the datacenter network is stored in the scheduler.
It is noted that the number of each of the nodes 100, 102a-b, 104a-d, 106a-h, 108a-p of the datacenter topography shown in Figure 1 are for illustrative purposes only. Each node may be in electrical connection with more or fewer nodes than shown in Figure 1 . Further, it is noted that the structure of Figure 1 is a hierarchical structure, in which hosts 108a-p may be considered the leaves of the network. Hosts 108a-p can have different network cards, features and capacity. Hosts 108a-p may have support for hardware-accelerated network virtualization.
The majority of network traffic in a datacentre stays within the datacentre. This is termed east-west traffic. A few VMs may 1 10a-p communicate with the external world, through core switches and the datacenter border gateway 100, but it is estimated that approximately 70% of traffic is internal to the datacentre. The methods and apparatus disclosed herein may schedule a number of VMs 1 10a-p that communicate a lot with each other in close proximity, turning traffic as early as possible in the network hierarchy, thus sparing datacentre links and reducing the amount of network traffic within the datacenter. Figure 2 shows a schematic representation of a host computer 108. The methods and apparatus disclosed herein permit the host computer 108 to monitor network traffic associated with a VM to determine a network traffic profile for the VM. The host 108 comprises a transmitter 202 and a receiver 204, which form part of a communication unit 205. The transmitter 202 and receiver 204 are in electrical communication with other nodes and/or functions in a datacenter and are configured to transmit and receive data accordingly.
As used herein, the term "network traffic profile" encompasses data relating to the type (e.g., TCP or UDP), rate (e.g. packets/sec and/or bytes/sec), burstiness, priority settings, round trip time (RTT), source and/or destination of network traffic sent from and received by a VM. In specific methods and apparatus, as set out below, the network traffic profile may comprise addresses of peer VMs 1 10 with which a VM communicates, an amount of network traffic sent from and/or received by the VM and/or an indication of the rate of data sent from and/or received by the VM (e.g. in packets/sec and bytes/sec). However, the solution is not limited to only those parameters, e.g., RTT and priority setting may be used as well. The host 108 further comprises a memory 206 and a processor 208. The memory 206 may comprise non-volatile memory and/or volatile memory. The memory 206 may have a computer program 207 stored therein. The computer program 207 may be configured to undertake the methods disclosed herein. The computer program 207 may be loaded in the memory 206 from a non-transitory computer readable medium 209, on which the computer program is stored. The processor 208 is configured to undertake the functions of a monitor 210, and a profiler 212.
Each of the transmitter 202, receiver 204, communications unit 205, memory 206, processor 208, monitor 210 and profiler 212 is in electrical communication with the other features 202, 204, 205 206, 208, 210, 212 of the host 108. The host 108 can be implemented as a combination of computer hardware and software. In particular, the monitor 210 and profiler 212 may be implemented as software configured to run on the processor 208. The memory 206 stores the various programs/executable files that are executed in a processor 208, and also provides a storage unit for any required data. The programs/executable files stored in the memory 206, and executed in the processor 208, can include the monitor 210 and the profiler 212, but are not limited to such. Figure 3 shows a schematic representation of a scheduling node 300 of a datacenter network configured to schedule a VM to a host 108 in accordance with the methods and apparatus disclosed herein. Specifically, the scheduling node 300 may be configured to receive a network traffic profile and schedule a VM to a host 108, based on that received profile. The scheduling node 300 may form part of any of the nodes 100, 102, 104, 1066, 108 of the datacenter topology shown in Figure 1. Alternatively, the scheduling node 300 may be a separate node that is independent from the topology shown in Figure 1 , typically part of a datacenter Cloud Management System. It is noted that in some methods and apparatus, the scheduling node 300 may be the same node as the host computer 108 configured to monitor the VM and determine the network traffic profile, as set out below. In such methods and apparatus, the transmission of the network traffic profile to the scheduling node 300 may be an internal transmission within a single node. In other methods and apparatus, the scheduling node 300 may be deployed in a virtual machine that may be physically deployed in any of the hosts. The scheduling node 300 comprises a transmitter 302 and a receiver 304, which form part of a communication unit 305. The transmitter 302 and receiver 304 are in electrical communication with other nodes and/or functions in a datacenter and are configured to transmit and receive data accordingly.
The scheduling node 300 further comprises a memory 306 and a processor 308. The memory 306 may comprise non-volatile memory and/or volatile memory. The memory
306 may have a computer program 307 stored therein. The computer program 307 may be configured to undertake the methods disclosed herein. The computer program
307 may be loaded in the memory 306 from a non-transitory computer readable medium 309, on which the computer program is stored. The processor 308 is configured to undertake the functions of a scheduler 310.
Each of the transmitter 302, receiver 304, communications unit 305, memory 306, processor 308 and scheduler 310 is in electrical communication with the other features 302, 304, 305, 306, 308, 310 of the scheduling node 300. The scheduling node 300 can be implemented as a combination of computer hardware and software. In particular, the scheduler 310 may be implemented as software configured to run on the processor 308. The memory 306 stores the various programs/executable files that are executed in a processor 308, and also provides a storage unit for any required data. The programs/executable files stored in the memory 306, and executed in the processor 308, can include the scheduler 310, but are not limited to such.
The methods and apparatus disclosed herein are capable of profiling the networking needs of one or more VMs 1 10a-p. The objective is to perform characterisation of the network traffic pattern sent and/or received by a VM 1 10a-p. This is done through the use of enhanced counters that may be added to a hypervisor function running on a host 108a-p. The monitor 210 and profiler 212 may form part of a hypervisor. Existing counters typically monitor a total amount of traffic in/out of a network card.
Referring back to Figure 1 , a VM 1 1 Oh is considered to be a monitored VM and the VMs 1 10b-d are considered to be peer VMs with which the monitored VM communicates. The monitor 210 may comprise an enhanced counter configured to log the source and/or destination of network traffic received and/or sent by the monitored VM 1 10h. The monitor 210 may also comprise enhanced counters configured to determine an amount of network traffic sent to and/or received from each of a plurality of peer VMs 1 10b-d, which may be averaged over a period of time (packets/sec and bytes/sec).
Based on the monitored network traffic, the profiler 212 may be configured to output a list for each one of the VMs 1 10b-d with which a monitored VM 1 1 Oh communicates. This list may contain the IP addresses of the peer VMs 1 10b-d and the average amount of network traffic (packets/sec and bytes/sec) sent to and/or received from each peer VM 1 10b-d. The profiler 212 may also be configured to determine the rate and/or burstiness of network traffic sent to and/or received from each peer VM 1 10b-d and a round trip time (RTT) for network traffic sent to and/or received from each peer VM 1 10b-d. Other networking related parameters may also be monitored and determined.
The VMs 1 10a-h may be profiled at specific times while the datacenter is in operation and information fed to the scheduler 310 of the scheduling node 300. The profiler 212 may apply certain techniques to minimize the amount of data sent to the scheduler 310. Examples include, filtering out peer VMs 1 10b-d with a traffic volume below a certain threshold and reporting to the scheduler 312 only when substantial changes in a network traffic profile are determined.
Figure 4 shows a method for determining a network traffic profile for a VM 1 1 Oh. A monitor 210 of a host 108a-p monitors 400 network traffic associated with the VM 1 1 Oh. The monitor may therefore monitor one or more of the destination address, source address, rate and amount of the data sent to and/or received from the monitored VM 1 10h. This may be done by intercepting such traffic and receiving it at the receiver 205 of the host 108a-p. Therefore, it is advantageous for the monitor 210 configured to monitor the traffic of the VM 1 10h to be in the host 108h in which the VM 1 10h resides. The profiler 212 of the host 108a-p determines 402 a network traffic profile for the monitored VM 1 10h based on the monitored network traffic. The network traffic profile represents information indicating the networking needs of the monitored VM 1 10h. The profile indicates the peer VMs 1 10b-d with which the monitored VM communicates and the amount of communication with each, which may be used to determine the best location for the monitored VM. Further, the network traffic profile may indicate the rate (e.g. in packets/sec and/or bytes/sec), burstiness, priority settings, and RTT of the network traffic, all of which may be used to determine the hardware requirements of a host 108 in which the VM 1 10h is to reside.
The transmitter 202 of the host 108 transmits 404 the network traffic profile to the scheduling node 300.
The network traffic profile information may be used to schedule the monitored VM 1 10h to reside in an optimal location (host 108a-p). The optimal location may take into account the network resources needed by the monitored VM 1 1 Oh and determinable from the network traffic profile. The optimal location may also take account of the available hosts, whether the available hosts are hardware accelerated and the locations of the hosts 108b-d in which the peer VMs 1 10b-d reside. It is noted that the term "location" with reference to hosts 108a-h and/or VMs 1 10a-h may refer to location within the topology of the datacenter network and may not be associated with a physical location.
Exemplary methods and apparatus may be arranged to determine a host 108a-p in which the monitored VM 100h may reside, based on amount of packets/sec sent by the VM 100h and the capacity of a host 108a-p to provide the required amount of traffic. Hosts with hardware-accelerated virtualization are known to handle higher amounts of traffic (e.g. packets/sec) and smaller packets.
Exemplary methods and apparatus may be arranged to determine a host 108a-p in which the monitored VM 1 10h may reside, based on the host's 108a-p proximity to one or more of the peer VMs 1 1 Ob-d.
Figure 5 illustrates a scenario where a VM 1 1 Oh resides on a host 108h. A network traffic profile determined for VM 1 1 Oh shows that it communicates with the peer VMs 1 1 Ob-d. This is shown by the arrow 500. Intuitively, it is possible to determine from Figure 5 that if resources were available at the host 108a, network links could be spared by moving VM 1 1 Oh to host 108a. Even though this is quite intuitive when looking at a Figure 5, it is not easily determined by a computer system. The inventors have therefore defined an algorithm for determining the closeness of hosts 108a-p to each other. Obviously, more than one VM may be scheduled in a host (not shown in Figure 5). In its broadest sense, the determination of closeness comprises the inverse of the number of hops (links crossed) when travelling from a first host 108a-p to a second host 108a-p. Therefore, a closeness factor may be defined as:
closeness =— -—
hops
Where "hops" is the number of links crossed. For the avoidance of doubt, it is noted that transmitting data from the host 108a to the edge switch 106a involves one hop and transmitting data from the host 108a to the host 108b involves two hops (one hop to the edge switch 106a and a second hop to the host 108b).
The use of such a closeness factor means that there is no requirement to prepare a matrix relating to all hosts in a datacenter. The closeness factor may be a unique value attributable to a given host, based on the networking needs of a given VM. Further, the host in which a VM resides may be configured to monitor that VM, rather than having to monitor all VMs in the datacenter. This allows a particular VM to be scheduled efficiently with a reduced computational burden.
However, there are typically a plurality of peer VMs 1 10b-d, as shown in Figure 5. Therefore, the closeness algorithm may be refined to sum all the hops required to transmit data from a first host 108a-p to a second host 108a-p hosting a first peer VM 1 10b-d, and from the first host 108a-p to a third host 108a-p hosting a second peer VM 1 10b-d. Therefore, a closeness factor may be defined as:
closeness =
Figure imgf000016_0001
Where "i" is the index of the peer VM 1 10b-d. This allows the closeness of a host 108a-p to a plurality of hosts 108b-d hosting the peer VMs 1 10b-d to be determined. For example, the closeness factor of host 108h to the hosts 108b-d is the inverse of the sum of the number of hops from host 108h to the host 108b, the number of hops from host 108h to the host 108c and the number of hops from host 108h to the host 108d.
A further refinement of the closeness factor may be implemented to allow it to be related to the amount of traffic sent to and/or received from the peer VMs 1 10b-d. Accordingly, the volume of traffic should influence the determination of the host 108a-p in which the monitored VM 1 10h should be placed. Therefore, the closeness factor may be weighted by the amount of traffic sent to and/or received from a peer VM 1 10b- d:
closeness = i
j; hops(i) x traffic{i)
Where "i" is the index of the peer VM 1 10b-d. Therefore, when considering the closeness of the host computers 108a and 108e-p that are not hosting peer VMs 1 10b-d to the host computers 108b-d that are hosting the peer VMs 1 10b-d, it can be seen that host 108a is the closest.
Figure 6 shows a datacenter network topology after the monitored VM 1 1 Oh has been scheduled to be hosted by the host 108a. It is clear from Figure 6 that fewer hops are required for the VM 1 10h to transmit data to and receive data from the peer VMs 1 10b- d. No traffic is required to be sent over the links between the edge switch 106c and the aggregation switch 104b and no traffic is sent over the links between the aggregation switches 104a-b and the layer 2 switch 102a.
In exemplary methods and apparatus, candidate hosts are determined initially. The candidate hosts comprise hosts with enough computational resource and memory to run VM 1 1 Oh. Hosts that have less network capacity than is needed to run VM 1 1 Oh are not part of the list of candidate hosts, to determine a reduced list of candidate hosts. The amount of network capacity needed to run VM 1 1 Oh may be determined from the network traffic profile for the VM 1 10h. For each of the hosts 108a-p on the reduced list, the closeness factor is determined, as set out above. A host 108a-p is determined in which the VM 1 10h is to be hosted, based on the determined closeness factors. In exemplary methods and apparatus, the host 108a-p with highest closeness factor is selected, thus grouping together VMs that communicate a lot with each other.
Figure 7 shows a flow chart for a method for scheduling a VM to a host computer. The receiver 304 of the scheduling node 300 receives 700 the network traffic profile for the VM 1 10h. A set of candidate hosts is selected 701 by the scheduler 310 of the scheduling node 300. This may be done using a number of hosts, say ten, and the list of candidates may be determined based on the likely closeness of hosts to the peer VMs 1 10b-d. For example, if a host shares an edge switch 106a-h, an aggregation switch 104a-d or a layer 2 switch 102a-b with one or more of the peer VMs 1 10b-d. In exemplary methods and apparatus, the candidate list may be determined to include only hardware accelerated hosts. For illustrative purposes, the candidate hosts are considered to be 108a, e-g.
The scheduler 310 determines 702 whether the first host 108a of the candidate hosts 108a, e-g has sufficient network capacity to host the VM 1 1 Oh. This may be done based on the received network traffic profile. If no, the candidate host 108a is removed 704 from the list of candidate hosts 108a, e-g. The scheduler 310 then determines 706 whether any more candidate hosts remain to be assessed for network capacity. If yes, steps 702-706 are repeated for the next host 108e. If no, then all candidate hosts have been assessed for network capacity and a reduced list of candidate hosts has been determined. For illustrative purposes, the reduced list is considered to be the same as the original list. That is, it is considered that all candidate hosts 108a, e-g have sufficient network capacity.
The scheduler 310 determines 708 the closeness of the candidate hosts 108a, e-g on the reduced list, as set out above. The scheduler determines 710 the optimal host as the host on the reduced list having the highest closeness factor. For illustrative purposes, the optimal host is considered to be 108a. The transmitter 302 transmits 712 instructions for the determined host 108a to host the monitored VM 1 1 Oh as part of a VM migration process.
Figure 8 shows a flow diagram for the profiling and scheduling of a VM in a host computer of a datacenter. A monitor 210 of a host 108a-p monitors 800 network traffic associated with the VM 1 10h, as set out above. The profiler 212 of the host 108a-p determines 802 a network traffic profile for the monitored VM 1 1 Oh based on the monitored network traffic, as set out above. The transmitter 202 of the host 108 transmits 804 the network traffic profile to the scheduling node 300.
The receiver 304 of the scheduling node 300 receives 806 the network traffic profile for the VM 1 1 Oh. A set of candidate hosts is selected 808 by the scheduler 310 of the scheduling node 300, as set out above. The scheduler 310 determines 810 whether the first host 108a of the candidate hosts 108a, e-g has sufficient network capacity to host the VM 1 1 Oh, as set out above. If no, the candidate host 108a is removed 812 from the list of candidate hosts 108a, e-g. The scheduler 310 then determines 814 whether any more candidate hosts remain to be assessed for network capacity. If yes, steps 810-814 are repeated for the next host 108e. If no, then all candidate hosts have been assessed for network capacity and a reduced list of candidate hosts has been determined. As above, or illustrative purposes, the reduced list is considered to be the same as the original list. The scheduler 310 determines 816 the closeness of the candidate hosts 108a, e-g on the reduced list, as set out above. The scheduler determines 818 the optimal host as the host on the reduced list having the highest closeness factor. As above, for illustrative purposes, the optimal host is considered to be 108a. The transmitter 302 transmits 820 instructions for the determined host 108a to host the monitored VM 1 10h.
In exemplary methods and apparatus, based on the information in the network traffic profile, VMs 1 10a-p may be classified into two types.
1 . Signalling VMs: these are virtual machines hosting signalling applications, where the volume of traffic sent/received is low
2. Payload VMs: these are virtual machines hosting payload applications with high traffic volume and high throughput
Type 1 VMs have typically low traffic volume and low throughput and may communicate with a large set of peer VMs, depending on the application type. Type 2 VMs have typically high traffic volume and high throughput and typically communicate with a large set of peer VMs, either intra or inter-datacentre. The scheduler 310 may be configured to treat VMs of type 1 and type 2 differently. Type 1 VMs may be configured to be scheduled to any of the closest available host 108a-p having sufficient network capacity. Type 2 VMs may be configured to be scheduled to the closest available hardware accelerated host 108a-p having sufficient network capacity. Therefore, for Type 2 VMs, the scheduler 310 may be configured to determine a candidate host list comprising only hardware accelerated VMs.
The scheduler may be configured to use further profiling information provided by the profiler. For example, the scheduler may weight the closeness factor based on traffic priority settings thus placing the monitored VM closer to a peer VM whose traffic has higher priority settings. Other methods and apparatus include weighting the closeness factor with the RTT, thus improving perceived delay for the worst RTT results.
Exemplary methods and apparatus may determine a network traffic profile for one or more VMs at intervals while the datacenter is in operation. Therefore, the datacenter may be configured to reschedule VMs on the fly according to whether the network traffic profile has changed.
Exemplary methods and apparatus disclosed herein propose a scheduler that is network aware through the use of heuristics, taking into account the networking needs of each VM. Existing solutions focus on compute, memory and storage requirements only. Instead of demanding the owner of the VM to determine a priori the network needs, methods and apparatus disclosed herein may automatically detect the network traffic profiles of VMs to determine the optimal hosts for VMs. Methods and apparatus disclosed herein may adapt to the communication patterns of VMs by dynamically rescheduling VMs to hosts. This may optimise VM placement according to the peer VMs it communicates with. In the end, methods and apparatus disclosed herein may create conglomerates of VMs that talk intensively to each other. A computer program may be configured to provide any of the above described methods. The computer program may be provided on a computer readable medium. The computer program may be a computer program product. The product may comprise a non-transitory computer usable storage medium. The computer program product may have computer-readable program code embodied in the medium configured to perform the method. The computer program product may be configured to cause at least one processor to perform some or all of the method.
Various methods and apparatus are described herein with reference to block diagrams or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits. These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).
Computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer- readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks. A tangible, non-transitory computer-readable medium may include an electronic, magnetic, optical, electromagnetic, or semiconductor data storage system, apparatus, or device. More specific examples of the computer-readable medium would include the following: a portable computer diskette, a random access memory (RAM) circuit, a read-only memory (ROM) circuit, an erasable programmable read-only memory (EPROM or Flash memory) circuit, a portable compact disc read-only memory (CD- ROM), and a portable digital video disc read-only memory (DVD/Blu-ray).
The computer program instructions may also be loaded onto a computer and/or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer and/or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. Accordingly, the invention may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.) that runs on a processor, which may collectively be referred to as "circuitry," "a module" or variants thereof.
It should also be noted that in some alternate implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Moreover, the functionality of a given block of the flowcharts and/or block diagrams may be separated into multiple blocks and/or the functionality of two or more blocks of the flowcharts and/or block diagrams may be at least partially integrated. Finally, other blocks may be added/inserted between the blocks that are illustrated.
The skilled person will be able to envisage other embodiments without departing from the scope of the appended claims.

Claims

CLAIMS:
1 . A scheduling node (300) for scheduling a virtual machine (1 1 Oh) to a host computer (108a) in a datacenter, the scheduling node comprising:
a receiver (304) configured to receive a network traffic profile for the virtual machine, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates;
a scheduler (310) configured to determine a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile, the closeness factor comprising the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines, wherein the scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor; and
a transmitter (302) configured to transmit instructions for the determined host computer to host the virtual machine.
2. A scheduling node (300) according to claim 1 , wherein the closeness factor for the one or more candidate host computers (108a, e-g) comprises the inverse of a sum of a number of hops required to transmit data from a candidate host computer to each of a plurality of peer virtual machines (1 10b-d).
3. A scheduling node (300) according to claim 1 or 2, wherein the network traffic profile further comprises an amount of network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d), and wherein closeness factor for the one or more candidate host computers (108a, e-g) is weighted based on one or more of the amount of network traffic between the virtual machine (1 10h) and the one or more peer virtual machines (1 10b-d); a priority associated with network traffic between the virtual machine (1 1 Oh) and the one or more peer virtual machines (1 10b-d); and a round trip time for network traffic between the virtual machine (1 1 Oh) and the one or more peer virtual machines (1 10b-d).
4. A scheduling node (300) according to any preceding claim, wherein the scheduler (310) is configured to determine the closeness factor for each of a plurality of candidate host computers (108a, e-g), and wherein the selected host computer (108a) is selected from the plurality of candidate host computers based at least in part on the closeness factor.
5. A scheduling node (300) according to any preceding claim, wherein each of the one or more candidate host computers (108a, e-g) comprises sufficient available resources to host the virtual machine (1 10h).
6. A scheduling node (300) according to any preceding claim, wherein one or more of the candidate host computers (108a, e-g) is a hardware accelerated host computer.
7. A scheduling node (300) according to claim 6, wherein the scheduler (310) is configured to determine a demand of traffic throughput for a the virtual machine (1 10h), and to determine that the virtual machine is to be hosted in a hardware accelerated host computer if the determined demand for traffic throughput is above a threshold value.
8. A method for scheduling a virtual machine (1 1 Oh) to a host computer (108a) in a datacenter, the method comprising:
receiving (700), by a receiver (304) of a scheduling network node (300), a network traffic profile for the virtual machine, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates;
determining (710), by a scheduler (310), a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile, the closeness factor comprising the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines, wherein the scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor; and
transmitting (712), by a transmitter (302), instructions for the determined host computer to host the virtual machine.
9. A non-transitory computer readable medium (309) comprising computer readable code configured, when read and executed by a computer, to carry out the method according to claim 8.
10. A host computer (108a-p) for determining a network traffic profile of a virtual machine (1 10h) in a datacenter, the host computer comprising:
a monitor (210) configured to monitor network traffic sent to and/or received from the virtual machine;
a profiler (212) configured to determine a network traffic profile for the virtual machine based on the monitored network traffic, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates; and
a transmitter (202) configured to transmit the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine.
1 1 . A host computer (108) according to claim 10, wherein the network traffic profile comprises an amount of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).
12. A host computer (108) according to claim 10 or 1 1 , wherein the network traffic profile comprises a rate of transmission and/or reception of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).
13. A host computer (108) according to claim 10 or 1 1 , wherein the network traffic profile comprises an indication of a round trip time for the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).
14. A host computer (108) according to any of claims 10 to 13, wherein the network traffic profile comprises data indicating a burstiness of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).
15. A host computer (108) according to any of claims 10 to 14, wherein the network traffic profile comprises data indicating a priority associated with monitored network traffic.
16. A method for determining a network traffic profile of a virtual machine (1 10h) in a datacenter, the method comprising:
monitoring (400), by a monitor (210), network traffic sent to and/or received from the virtual machine; determining (402), by a profiler (212), a network traffic profile for the virtual machine based on the monitored network traffic, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates; and
transmitting (404), by a transmitter (202), the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine.
17. A non-transitory computer readable medium (209) comprising computer readable code configured, when read and executed by a computer, to carry out the method according to claim 16.
18. A system for profiling a virtual machine (1 1 Oh) and scheduling the virtual machine to a selected host computer (108a) in a datacenter, the system comprising: a host computer (108a-p) comprising a monitor (210) configured to monitor network traffic sent to and/or received from the virtual machine, a profiler (212) configured to determine a network traffic profile for the virtual machine based on the monitored network traffic, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates, and a transmitter (202) configured to transmit the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine;
the scheduling node comprising a receiver (304) configured to receive a network traffic profile for the virtual machine, a scheduler (310) configured to determine a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile, the closeness factor comprising the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines, wherein the scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor, and a transmitter (302) configured to transmit instructions for the determined host computer to host the virtual machine.
19. A method for profiling a virtual machine (1 1 Oh) and scheduling the virtual machine to a host computer (108a) in a datacenter, the method comprising:
monitoring (800), by a monitor (210) of host computer (108a-p), network traffic sent to and/or received from the virtual machine;
determining (802), by a profiler (212) of the host computer, a network traffic profile for the virtual machine based on the monitored network traffic, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates;
transmitting (804), by a transmitter (202) of the host computer, the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine;
receiving (806), by a receiver (304) of the scheduling node, a network traffic profile for a virtual machine;
determining (818), by a scheduler (310) of the scheduling node, a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile, the closeness factor comprising the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines, wherein the scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor; and
transmitting (820), by a transmitter (302) of the scheduling node, instructions to the determined host computer to host the virtual machine.
20. A non-transitory computer readable medium comprising computer readable code configured, when read and executed by a computer, to carry out the method according to claim 19.
PCT/EP2013/068297 2013-09-04 2013-09-04 Scheduling of virtual machines WO2015032430A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2013/068297 WO2015032430A1 (en) 2013-09-04 2013-09-04 Scheduling of virtual machines

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2013/068297 WO2015032430A1 (en) 2013-09-04 2013-09-04 Scheduling of virtual machines

Publications (1)

Publication Number Publication Date
WO2015032430A1 true WO2015032430A1 (en) 2015-03-12

Family

ID=49150931

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2013/068297 WO2015032430A1 (en) 2013-09-04 2013-09-04 Scheduling of virtual machines

Country Status (1)

Country Link
WO (1) WO2015032430A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9575794B2 (en) 2014-09-30 2017-02-21 Nicira, Inc. Methods and systems for controller-based datacenter network sharing
CN106453457A (en) * 2015-08-10 2017-02-22 微软技术许可有限责任公司 Multi-priority service instance distribution in cloud computing platform
WO2017028930A1 (en) * 2015-08-20 2017-02-23 Telefonaktiebolaget Lm Ericsson (Publ) Methods and apparatus for running an analytics function
WO2017152178A1 (en) * 2016-03-04 2017-09-08 Bladelogic, Inc. Provisioning of containers for virtualized applications
CN109544999A (en) * 2019-01-14 2019-03-29 中国民航大学 A kind of air traffic networks method for evaluating reliability based on cloud model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070211280A1 (en) * 2006-03-13 2007-09-13 Nikhil Bansal Method and apparatus for assigning candidate processing nodes in a stream-oriented computer system
WO2011096859A1 (en) * 2010-02-04 2011-08-11 Telefonaktiebolaget L M Ericsson (Publ) Network performance monitor for virtual machines
US8099487B1 (en) * 2006-07-06 2012-01-17 Netapp, Inc. Systems and methods for determining placement of virtual machines
WO2012141573A1 (en) * 2011-04-12 2012-10-18 Mimos Berhad Method and system for automatic deployment of grid compute nodes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070211280A1 (en) * 2006-03-13 2007-09-13 Nikhil Bansal Method and apparatus for assigning candidate processing nodes in a stream-oriented computer system
US8099487B1 (en) * 2006-07-06 2012-01-17 Netapp, Inc. Systems and methods for determining placement of virtual machines
WO2011096859A1 (en) * 2010-02-04 2011-08-11 Telefonaktiebolaget L M Ericsson (Publ) Network performance monitor for virtual machines
WO2012141573A1 (en) * 2011-04-12 2012-10-18 Mimos Berhad Method and system for automatic deployment of grid compute nodes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BARNABY MALET ET AL: "Resource allocation across multiple cloud data centres", PROCEEDINGS OF THE 8TH INTERNATIONAL WORKSHOP ON MIDDLEWARE FOR GRIDS, CLOUDS AND E-SCIENCE, MGC '10, 1 January 2010 (2010-01-01), New York, New York, USA, pages 1 - 6, XP055014744, ISBN: 978-1-45-030453-5, DOI: 10.1145/1890799.1890804 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9575794B2 (en) 2014-09-30 2017-02-21 Nicira, Inc. Methods and systems for controller-based datacenter network sharing
US10237136B2 (en) 2014-09-30 2019-03-19 Nicira, Inc. Method of distributing network policies for data compute nodes in a datacenter
US11082298B2 (en) 2014-09-30 2021-08-03 Nicira, Inc. Controller-based datacenter network bandwidth policy sharing
CN106453457A (en) * 2015-08-10 2017-02-22 微软技术许可有限责任公司 Multi-priority service instance distribution in cloud computing platform
US10630765B2 (en) 2015-08-10 2020-04-21 Microsoft Technology Licensing, Llc Multi-priority service instance allocation within cloud computing platforms
WO2017028930A1 (en) * 2015-08-20 2017-02-23 Telefonaktiebolaget Lm Ericsson (Publ) Methods and apparatus for running an analytics function
WO2017152178A1 (en) * 2016-03-04 2017-09-08 Bladelogic, Inc. Provisioning of containers for virtualized applications
US10693948B2 (en) 2016-03-04 2020-06-23 Bladelogic Inc. Provisioning of containers for virtualized applications
AU2017228442B2 (en) * 2016-03-04 2020-11-05 Bladelogic, Inc. Provisioning of containers for virtualized applications
CN109544999A (en) * 2019-01-14 2019-03-29 中国民航大学 A kind of air traffic networks method for evaluating reliability based on cloud model

Similar Documents

Publication Publication Date Title
US10447594B2 (en) Ensuring predictable and quantifiable networking performance
EP3624400B1 (en) Technologies for deploying virtual machines in a virtual network function infrastructure
US10355959B2 (en) Techniques associated with server transaction latency information
CN107852413B (en) Network device, method and storage medium for offloading network packet processing to a GPU
CN107925588B (en) Method, apparatus, device and medium for platform processing core configuration
US10932136B2 (en) Resource partitioning for network slices in segment routing networks
WO2018086569A1 (en) Dynamic sdn configuration method based on application awareness of virtual network
EP2907276B1 (en) System and method for efficient use of flow table space in a network environment
EP2972855B1 (en) Automatic configuration of external services based upon network activity
US11394649B2 (en) Non-random flowlet-based routing
US9559968B2 (en) Technique for achieving low latency in data center network environments
US10397131B2 (en) Method and system for determining bandwidth demand
US20100287262A1 (en) Method and system for guaranteed end-to-end data flows in a local networking domain
US20140025823A1 (en) Methods for managing contended resource utilization in a multiprocessor architecture and devices thereof
US10153979B2 (en) Prioritization of network traffic in a distributed processing system
WO2014077904A1 (en) Policy enforcement in computing environment
WO2015032430A1 (en) Scheduling of virtual machines
Hwang et al. Deadline and incast aware TCP for cloud data center networks
WO2020232182A1 (en) Quality of service in virtual service networks
US11490366B2 (en) Network function virtualisation
Chakraborty et al. A low-latency multipath routing without elephant flow detection for data centers
WO2018057165A1 (en) Technologies for dynamically transitioning network traffic host buffer queues
KR102174979B1 (en) Method for controlling transsion of packet in virtual switch
CN114567481A (en) Data transmission method and device, electronic equipment and storage medium
JP7148596B2 (en) Network-aware elements and how to use them

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13759716

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE