WO2015032430A1

WO2015032430A1 - Scheduling of virtual machines

Info

Publication number: WO2015032430A1
Application number: PCT/EP2013/068297
Authority: WO
Inventors: Victor Souza
Original assignee: Telefonaktiebolaget L M Ericsson (Publ)
Priority date: 2013-09-04
Filing date: 2013-09-04
Publication date: 2015-03-12

Abstract

Methods and apparatus for profiling a virtual machine (110h) and/or scheduling the virtual machine to a selected host computer (108a) in a datacenter. A system comprises a host computer (108a-p) comprising a monitor (210) configured to monitor network traffic sent to and/or received from the virtual machine. The host computer comprises a profiler (212) configured to determine a network traffic profile for the virtual machine based on the monitored network traffic. The network traffic profile comprises an address of one or more peer virtual machines (110b-d) with which the virtual machine communicates. The host computer comprises a transmitter (202) configured to transmit the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine. The scheduling node comprises a receiver (304) configured to receive a network traffic profile for the virtual machine. The scheduling node comprises a scheduler (310) configured to determine a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile. The closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines. The scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor. The scheduling node comprises a transmitter (302) configured to transmit instructions for the determined host computer to host the virtual machine.

Description

SCHEDULING OF VIRTUAL MACHINES

Technical field This invention relates to methods and apparatus for scheduling of virtual machines (VMs) in host computers (hosts) in a datacenter. Specifically, the invention may relate to, but is not limited to, scheduling of VMs in a cloud datacenter.

Background

Datacenters often make use of virtualisation technologies, which may define a plurality of VMs capable of undertaking particular tasks within the datacenter. VMs may exist in any of a plurality of hosts (e.g., servers) that form a part of the datacenter and the location of one or more VMs within one or more hosts affects the speed of operations and the amount of network traffic within a datacenter.

Many cloud datacenter solutions comprise a scheduler for determining where a VM should be executed (i.e., which host computer). Those schedulers range from very simple schedulers that, for example, randomly pick a host to very advanced schedulers that, for example hold multiple sub-schedulers. VM scheduling directly impacts the efficiency of a datacenter and the performance of running VMs within the datacenter.

In known solutions, for example, a scheduler may implement different scheduling policies (sometimes called scheduling objectives), depending on the objective of the user and cloud provider. A packing policy minimizes the number of hosts in use. A stripping policy maximizes the resources available to a VM by spreading the VMs across a plurality of hosts. A load-aware policy maximizes the resources available to a VM by using hosts with greater available capacity. Existing technology focuses on the allocation of computation resources and memory. When deploying payload applications that deal with the payload of data packets, networking resource is more important. This is valid for most telecommunications and network related products. A VM placement algorithm is disclosed in nline traffic-aware virtual machine placement in data center networks", Dias & Costa. Traffic between all VMs in a datacenter is used to create a traffic matrix. The datacenter network is modelled as a graph using the traffic matrix to cluster VMs that exchange traffic. In "cloudmirror: application-aware bandwidth reservations in the cloud", Lee et al, an abstraction for specifying bandwidth guarantees as a graph between components in a network.

A method of achieving more efficient allocation of resources within a datacenter is desirable. This is even truer for telecommunication applications deployed in a datacenter, as such applications are commonly payload applications. Examples of telecommunications applications deployed in datacenters include evolved packet gateways, routers, network proxies and middleboxes, amongst others.

It is advantageous to determine the optimal host machine in which a given VM may be located.

Summary

According to the invention in a first aspect, there is provided a scheduling node (300) for scheduling a virtual machine (1 10h) to a host computer (108a) in a datacenter. The scheduling node comprises a receiver (304) configured to receive a network traffic profile for the virtual machine, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates. The scheduling node comprises a scheduler (310) configured to determine a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile. The closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines. The scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor. The scheduling node comprises a transmitter (302) configured to transmit instructions for the determined host computer to host the virtual machine.

Optionally, the closeness factor for the one or more candidate host computers (108a, e-g) comprises the inverse of a sum of a number of hops required to transmit data from a candidate host computer to each of a plurality of peer virtual machines (1 10b-d). Optionally, the network traffic profile further comprises an amount of network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d), and wherein closeness factor for the one or more candidate host computers (108a, e-g) is weighted based on one or more of the amount of network traffic between the virtual machine (1 1 Oh) and the one or more peer virtual machines (1 10b-d); a priority associated with network traffic between the virtual machine (1 10h) and the one or more peer virtual machines (1 10b-d); and a round trip time for network traffic between the virtual machine (1 10h) and the one or more peer virtual machines (1 10b-d). Optionally, the scheduler (310) is configured to determine the closeness factor for each of a plurality of candidate host computers (108a, e-g), and wherein the selected host computer (108a) is selected from the plurality of candidate host computers based at least in part on the closeness factor. Optionally, each of the one or more candidate host computers (108a, e-g) comprises sufficient available resources to host the virtual machine (1 1 Oh).

Optionally, one or more of the candidate host computers (108a, e-g) is a hardware accelerated host computer.

Optionally, the scheduler (310) is configured to determine a demand of traffic throughput for a the virtual machine (1 10h), and to determine that the virtual machine is to be hosted in a hardware accelerated host computer if the determined demand for traffic throughput is above a threshold value.

According to the invention in a second aspect, there is provided a method for scheduling a virtual machine (1 1 Oh) to a host computer (108a) in a datacenter. The method comprises receiving (700), by a receiver (304) of a scheduling network node (300), a network traffic profile for the virtual machine. The network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates. The method comprises determining (710), by a scheduler (310), a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile. The closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines. The scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor. The method comprises transmitting (712), by a transmitter (302), instructions for the determined host computer to host the virtual machine.

According to the invention in a third aspect, there is provided a non-transitory computer readable medium (309) comprising computer readable code configured, when read and executed by a computer, to carry out the method described above.

According to the invention in a fourth aspect, there is provided a host computer (108a- p) for determining a network traffic profile of a virtual machine (1 1 Oh) in a datacenter. The host computer comprises a monitor (210) configured to monitor network traffic sent to and/or received from the virtual machine. The host computer comprises a profiler (212) configured to determine a network traffic profile for the virtual machine based on the monitored network traffic. The network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates. The host computer comprises a transmitter (202) configured to transmit the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine.

Optionally, the network traffic profile comprises an amount of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).

Optionally, the network traffic profile comprises a rate of transmission and/or reception of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).

Optionally, the network traffic profile comprises an indication of a round trip time for the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d). Optionally, the network traffic profile comprises data indicating a burstiness of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).

Optionally, the network traffic profile comprises data indicating a priority associated with monitored network traffic. According to the invention in a fifth aspect, there is provided a method for determining a network traffic profile of a virtual machine (1 1 Oh) in a datacenter. The method comprises monitoring (400), by a monitor (210), network traffic sent to and/or received from the virtual machine. The method comprises determining (402), by a profiler (212), a network traffic profile for the virtual machine based on the monitored network traffic. The network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates. The method comprises transmitting (404), by a transmitter (202), the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine.

According to the invention in a sixth aspect, there is provided a non-transitory computer readable medium (209) comprising computer readable code configured, when read and executed by a computer, to carry out the described above. According to the invention in a seventh aspect, there is provided a system for profiling a virtual machine (1 1 Oh) and scheduling the virtual machine to a selected host computer (108a) in a datacenter. The system comprises a host computer (108a-p) comprising a monitor (210) configured to monitor network traffic sent to and/or received from the virtual machine. The host computer comprises a profiler (212) configured to determine a network traffic profile for the virtual machine based on the monitored network traffic. The network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates. The host computer comprises a transmitter (202) configured to transmit the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine. The scheduling node comprises a receiver (304) configured to receive a network traffic profile for the virtual machine. The scheduling node comprises a scheduler (310) configured to determine a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile. The closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines. The scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor. The scheduling node comprises a transmitter (302) configured to transmit instructions for the determined host computer to host the virtual machine. According to the invention in eighth aspect, there is provided a method for profiling a virtual machine (1 10h) and scheduling the virtual machine to a host computer (108a) in a datacenter. The method comprises monitoring (800), by a monitor (210) of host computer (108a-p), network traffic sent to and/or received from the virtual machine. The method comprises determining (802), by a profiler (212) of the host computer, a network traffic profile for the virtual machine based on the monitored network traffic. The network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates. The method comprises transmitting (804), by a transmitter (202) of the host computer, the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine. The method comprises receiving (806), by a receiver (304) of the scheduling node, a network traffic profile for a virtual machine. The method comprises determining (818), by a scheduler (310) of the scheduling node, a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile. The closeness factor comprises the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines. The scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor. The method comprises transmitting (820), by a transmitter (302) of the scheduling node, instructions to the determined host computer to host the virtual machine. According to the invention in a ninth aspect, there is provided a non-transitory computer readable medium comprising computer readable code configured, when read and executed by a computer, to carry out the method described above.

Brief description of the drawings

Exemplary methods and apparatus are described herein with reference to the accompanying drawings, in which:

Figure 1 is a schematic of a datacenter topology;

Figure 2 is a schematic of a host computer;

Figure 3 is a schematic of a scheduling node; Figure 4 is a flow diagram of a method for determining a network traffic profile of a virtual machine (1 1 Oh) in a datacenter; Figure 5 is a schematic of a datacenter topology; Figure 6 is a schematic of a datacenter topology;

Figure 7 is a flow diagram of a method for scheduling a virtual machine (1 10h) to a host computer (108a) in a datacenter; and

Figure 8 is a flow diagram of a method for profiling a virtual machine (1 1 Oh) and scheduling the virtual machine to a host computer (108a) in a datacenter.

Description

Generally, disclosed herein are methods and apparatus for profiling and/or scheduling virtual machines within a database of host computers (e.g. servers). Exemplary methods and apparatus may determine network capacity and location for a plurality of candidate hosts and schedule a VM to one of the plurality of candidate hosts based on the determined network capacity and/or the location. This may provide advantages of reduced network traffic within a datacenter.

Methods and apparatus disclosed herein leverage recent advancements in hardware accelerated virtualization. An existing technology for hardware-accelerated virtualized networking is single-root input/output virtualization (SR-IOV). SR-IOV provides a method of virtualizing a peripheral component interconnect (PCI) card and/or peripheral component interconnect express (PCIe) card, which means that a single PCI card can show up as many virtual PCI cards. An SR-IOV network card possesses independent virtual functions (VFs), which behave just like an independent network interface controller (NIC) for a VM. This allows VMs to have a dedicated virtual NIC in hardware, with a higher throughput of network traffic than through a virtual switch (in software).

The inventors have appreciated that if payload applications are deployed in datacenters in which VM scheduling is based on the allocation of computation resources and memory, there is no way to demand a given network capacity for that application (VM). Applications may then underperform, time-out and/or crash, depending on the nature of the protocols involved. Moreover, the inventors have appreciated that once applications are running in a datacenter, scheduling should be adaptable to the networking needs and communication patterns of its applications.

Methods and apparatus disclosed herein propose heuristic scheduling of VMs in hosts of a datacenter. Such heuristics may be based on a determined network traffic profile of a VM. The network traffic profile may indicate the networking needs of running one or more VMs in one or more hosts in a datacenter, e.g., measured in terms of data packets transmitted/received, possibly over a given time period. The network traffic profile may be determined in an automated fashion by one or more profilers in one or more hosts by, for example, a hypervisor application. The one or more profilers configured to determine the network traffic profile may advantageously be located in one or more of a plurality of hosts. In specific exemplary methods and apparatus, a profiler may be located in every host hypervisor where a VM may need to be profiled. A network traffic profile of a VM may be transmitted to a scheduling node of the datacenter comprising a scheduler. The scheduling node possesses a topology map of the underlying network infrastructure and is configured to determine a host computer that should host the VM based on the network traffic profile. It is noted that in some exemplary methods and apparatus, the host configured to monitor the VM and determine the network traffic profile may be collocated with the scheduling node and may be the same node of the datacenter.

Hosts with enhanced capabilities may be present in a datacenter. An example of such hosts includes those with hardware accelerated network virtualization (e.g., SR-IOV or Virtual Machine Device Queues (VMDq)). VMs with stringent networking requirements, as determined by monitoring network traffic associated with a VM for determining a network traffic profile, may be allocated hosts with better networking features. This may allow for better execution of payload applications.

Exemplary methods and apparatus disclosed herein may group VMs that communicate a lot with each other. That is, a VM will communicate commonly with a plurality of peer VMs and the VM may be located in close proximity to such peer VMs. By doing so, the amount of network traffic in a datacenter may be mitigated and unnecessary overloading of the datacenter links may be avoided. Methods and apparatus disclosed herein may use a notion of "closeness" to determine a closeness factor, as defined herein, to determine whether a host is in close proximity to peer VMs (or host computers hosting peer VMs). The closeness factor may be weighted by the amount of traffic that the monitored VM sends to and receives from each peer VM. Exemplary methods and apparatus may comprise one or more of the following:

• Determining a network traffic profile of a VM running on a first host by monitoring network traffic by a hypervisor function

• Transmitting the network traffic profile information to a scheduling node comprising a network aware scheduler

• Determining whether there is a better host in which to run the VM, e.g., a hardware accelerated host

• Scheduling the VM close to other VMs it communicates with. These are from here on called peer VMs.

Figure 1 shows a typical structure of a datacenter network comprising a plurality of nodes 100, 102a-b, 104a-d, 106a-h, 108a-p. A datacenter border gateway 100 may control access for the datacenter network to the Internet. The datacenter border gateway 100 may therefore be a router. The datacenter border gateway is in electrical communication with two layer 2 (of the open systems interconnection (OSI) model) switches 102a-b. The layer 2 switches 102a-b are each in electrical communication with four aggregation switches 104a-d. The aggregation switches 104a-d are each in electrical communication with two edge (or access) switches 106a-h. The edge switches 106a-h are each in electrical communication with two host computers 108a-p, which are configured to host one or more VMs 1 10a-p. For clarity, not all of the edge switches 106a-h, hosts 108a-p and VMs 1 10a-p are individually referenced, but these features of Figure 2 run consecutively from left to right with "a" on the left hand side.

This typical arrangement of a datacenter network is also known as a tree structure. Other arrangements include VL2, fat-tree, and B-Cube. It is noted that even though the solution herein presented can be applied to those other types of arrangements and more. The topological map of the datacenter network is stored in the scheduler.

It is noted that the number of each of the nodes 100, 102a-b, 104a-d, 106a-h, 108a-p of the datacenter topography shown in Figure 1 are for illustrative purposes only. Each node may be in electrical connection with more or fewer nodes than shown in Figure 1 . Further, it is noted that the structure of Figure 1 is a hierarchical structure, in which hosts 108a-p may be considered the leaves of the network. Hosts 108a-p can have different network cards, features and capacity. Hosts 108a-p may have support for hardware-accelerated network virtualization.

The majority of network traffic in a datacentre stays within the datacentre. This is termed east-west traffic. A few VMs may 1 10a-p communicate with the external world, through core switches and the datacenter border gateway 100, but it is estimated that approximately 70% of traffic is internal to the datacentre. The methods and apparatus disclosed herein may schedule a number of VMs 1 10a-p that communicate a lot with each other in close proximity, turning traffic as early as possible in the network hierarchy, thus sparing datacentre links and reducing the amount of network traffic within the datacenter. Figure 2 shows a schematic representation of a host computer 108. The methods and apparatus disclosed herein permit the host computer 108 to monitor network traffic associated with a VM to determine a network traffic profile for the VM. The host 108 comprises a transmitter 202 and a receiver 204, which form part of a communication unit 205. The transmitter 202 and receiver 204 are in electrical communication with other nodes and/or functions in a datacenter and are configured to transmit and receive data accordingly.

As used herein, the term "network traffic profile" encompasses data relating to the type (e.g., TCP or UDP), rate (e.g. packets/sec and/or bytes/sec), burstiness, priority settings, round trip time (RTT), source and/or destination of network traffic sent from and received by a VM. In specific methods and apparatus, as set out below, the network traffic profile may comprise addresses of peer VMs 1 10 with which a VM communicates, an amount of network traffic sent from and/or received by the VM and/or an indication of the rate of data sent from and/or received by the VM (e.g. in packets/sec and bytes/sec). However, the solution is not limited to only those parameters, e.g., RTT and priority setting may be used as well. The host 108 further comprises a memory 206 and a processor 208. The memory 206 may comprise non-volatile memory and/or volatile memory. The memory 206 may have a computer program 207 stored therein. The computer program 207 may be configured to undertake the methods disclosed herein. The computer program 207 may be loaded in the memory 206 from a non-transitory computer readable medium 209, on which the computer program is stored. The processor 208 is configured to undertake the functions of a monitor 210, and a profiler 212.

Each of the transmitter 202, receiver 204, communications unit 205, memory 206, processor 208, monitor 210 and profiler 212 is in electrical communication with the other features 202, 204, 205 206, 208, 210, 212 of the host 108. The host 108 can be implemented as a combination of computer hardware and software. In particular, the monitor 210 and profiler 212 may be implemented as software configured to run on the processor 208. The memory 206 stores the various programs/executable files that are executed in a processor 208, and also provides a storage unit for any required data. The programs/executable files stored in the memory 206, and executed in the processor 208, can include the monitor 210 and the profiler 212, but are not limited to such. Figure 3 shows a schematic representation of a scheduling node 300 of a datacenter network configured to schedule a VM to a host 108 in accordance with the methods and apparatus disclosed herein. Specifically, the scheduling node 300 may be configured to receive a network traffic profile and schedule a VM to a host 108, based on that received profile. The scheduling node 300 may form part of any of the nodes 100, 102, 104, 1066, 108 of the datacenter topology shown in Figure 1. Alternatively, the scheduling node 300 may be a separate node that is independent from the topology shown in Figure 1 , typically part of a datacenter Cloud Management System. It is noted that in some methods and apparatus, the scheduling node 300 may be the same node as the host computer 108 configured to monitor the VM and determine the network traffic profile, as set out below. In such methods and apparatus, the transmission of the network traffic profile to the scheduling node 300 may be an internal transmission within a single node. In other methods and apparatus, the scheduling node 300 may be deployed in a virtual machine that may be physically deployed in any of the hosts. The scheduling node 300 comprises a transmitter 302 and a receiver 304, which form part of a communication unit 305. The transmitter 302 and receiver 304 are in electrical communication with other nodes and/or functions in a datacenter and are configured to transmit and receive data accordingly.

The scheduling node 300 further comprises a memory 306 and a processor 308. The memory 306 may comprise non-volatile memory and/or volatile memory. The memory

306 may have a computer program 307 stored therein. The computer program 307 may be configured to undertake the methods disclosed herein. The computer program

307 may be loaded in the memory 306 from a non-transitory computer readable medium 309, on which the computer program is stored. The processor 308 is configured to undertake the functions of a scheduler 310.

Each of the transmitter 302, receiver 304, communications unit 305, memory 306, processor 308 and scheduler 310 is in electrical communication with the other features 302, 304, 305, 306, 308, 310 of the scheduling node 300. The scheduling node 300 can be implemented as a combination of computer hardware and software. In particular, the scheduler 310 may be implemented as software configured to run on the processor 308. The memory 306 stores the various programs/executable files that are executed in a processor 308, and also provides a storage unit for any required data. The programs/executable files stored in the memory 306, and executed in the processor 308, can include the scheduler 310, but are not limited to such.

The methods and apparatus disclosed herein are capable of profiling the networking needs of one or more VMs 1 10a-p. The objective is to perform characterisation of the network traffic pattern sent and/or received by a VM 1 10a-p. This is done through the use of enhanced counters that may be added to a hypervisor function running on a host 108a-p. The monitor 210 and profiler 212 may form part of a hypervisor. Existing counters typically monitor a total amount of traffic in/out of a network card.

Referring back to Figure 1 , a VM 1 1 Oh is considered to be a monitored VM and the VMs 1 10b-d are considered to be peer VMs with which the monitored VM communicates. The monitor 210 may comprise an enhanced counter configured to log the source and/or destination of network traffic received and/or sent by the monitored VM 1 10h. The monitor 210 may also comprise enhanced counters configured to determine an amount of network traffic sent to and/or received from each of a plurality of peer VMs 1 10b-d, which may be averaged over a period of time (packets/sec and bytes/sec).

Based on the monitored network traffic, the profiler 212 may be configured to output a list for each one of the VMs 1 10b-d with which a monitored VM 1 1 Oh communicates. This list may contain the IP addresses of the peer VMs 1 10b-d and the average amount of network traffic (packets/sec and bytes/sec) sent to and/or received from each peer VM 1 10b-d. The profiler 212 may also be configured to determine the rate and/or burstiness of network traffic sent to and/or received from each peer VM 1 10b-d and a round trip time (RTT) for network traffic sent to and/or received from each peer VM 1 10b-d. Other networking related parameters may also be monitored and determined.

The VMs 1 10a-h may be profiled at specific times while the datacenter is in operation and information fed to the scheduler 310 of the scheduling node 300. The profiler 212 may apply certain techniques to minimize the amount of data sent to the scheduler 310. Examples include, filtering out peer VMs 1 10b-d with a traffic volume below a certain threshold and reporting to the scheduler 312 only when substantial changes in a network traffic profile are determined.

Figure 4 shows a method for determining a network traffic profile for a VM 1 1 Oh. A monitor 210 of a host 108a-p monitors 400 network traffic associated with the VM 1 1 Oh. The monitor may therefore monitor one or more of the destination address, source address, rate and amount of the data sent to and/or received from the monitored VM 1 10h. This may be done by intercepting such traffic and receiving it at the receiver 205 of the host 108a-p. Therefore, it is advantageous for the monitor 210 configured to monitor the traffic of the VM 1 10h to be in the host 108h in which the VM 1 10h resides. The profiler 212 of the host 108a-p determines 402 a network traffic profile for the monitored VM 1 10h based on the monitored network traffic. The network traffic profile represents information indicating the networking needs of the monitored VM 1 10h. The profile indicates the peer VMs 1 10b-d with which the monitored VM communicates and the amount of communication with each, which may be used to determine the best location for the monitored VM. Further, the network traffic profile may indicate the rate (e.g. in packets/sec and/or bytes/sec), burstiness, priority settings, and RTT of the network traffic, all of which may be used to determine the hardware requirements of a host 108 in which the VM 1 10h is to reside.

The transmitter 202 of the host 108 transmits 404 the network traffic profile to the scheduling node 300.

The network traffic profile information may be used to schedule the monitored VM 1 10h to reside in an optimal location (host 108a-p). The optimal location may take into account the network resources needed by the monitored VM 1 1 Oh and determinable from the network traffic profile. The optimal location may also take account of the available hosts, whether the available hosts are hardware accelerated and the locations of the hosts 108b-d in which the peer VMs 1 10b-d reside. It is noted that the term "location" with reference to hosts 108a-h and/or VMs 1 10a-h may refer to location within the topology of the datacenter network and may not be associated with a physical location.

Exemplary methods and apparatus may be arranged to determine a host 108a-p in which the monitored VM 100h may reside, based on amount of packets/sec sent by the VM 100h and the capacity of a host 108a-p to provide the required amount of traffic. Hosts with hardware-accelerated virtualization are known to handle higher amounts of traffic (e.g. packets/sec) and smaller packets.

Exemplary methods and apparatus may be arranged to determine a host 108a-p in which the monitored VM 1 10h may reside, based on the host's 108a-p proximity to one or more of the peer VMs 1 1 Ob-d.

Figure 5 illustrates a scenario where a VM 1 1 Oh resides on a host 108h. A network traffic profile determined for VM 1 1 Oh shows that it communicates with the peer VMs 1 1 Ob-d. This is shown by the arrow 500. Intuitively, it is possible to determine from Figure 5 that if resources were available at the host 108a, network links could be spared by moving VM 1 1 Oh to host 108a. Even though this is quite intuitive when looking at a Figure 5, it is not easily determined by a computer system. The inventors have therefore defined an algorithm for determining the closeness of hosts 108a-p to each other. Obviously, more than one VM may be scheduled in a host (not shown in Figure 5). In its broadest sense, the determination of closeness comprises the inverse of the number of hops (links crossed) when travelling from a first host 108a-p to a second host 108a-p. Therefore, a closeness factor may be defined as:

closeness =— -—

hops

Where "hops" is the number of links crossed. For the avoidance of doubt, it is noted that transmitting data from the host 108a to the edge switch 106a involves one hop and transmitting data from the host 108a to the host 108b involves two hops (one hop to the edge switch 106a and a second hop to the host 108b).

The use of such a closeness factor means that there is no requirement to prepare a matrix relating to all hosts in a datacenter. The closeness factor may be a unique value attributable to a given host, based on the networking needs of a given VM. Further, the host in which a VM resides may be configured to monitor that VM, rather than having to monitor all VMs in the datacenter. This allows a particular VM to be scheduled efficiently with a reduced computational burden.

However, there are typically a plurality of peer VMs 1 10b-d, as shown in Figure 5. Therefore, the closeness algorithm may be refined to sum all the hops required to transmit data from a first host 108a-p to a second host 108a-p hosting a first peer VM 1 10b-d, and from the first host 108a-p to a third host 108a-p hosting a second peer VM 1 10b-d. Therefore, a closeness factor may be defined as:

closeness =

Where "i" is the index of the peer VM 1 10b-d. This allows the closeness of a host 108a-p to a plurality of hosts 108b-d hosting the peer VMs 1 10b-d to be determined. For example, the closeness factor of host 108h to the hosts 108b-d is the inverse of the sum of the number of hops from host 108h to the host 108b, the number of hops from host 108h to the host 108c and the number of hops from host 108h to the host 108d.

A further refinement of the closeness factor may be implemented to allow it to be related to the amount of traffic sent to and/or received from the peer VMs 1 10b-d. Accordingly, the volume of traffic should influence the determination of the host 108a-p in which the monitored VM 1 10h should be placed. Therefore, the closeness factor may be weighted by the amount of traffic sent to and/or received from a peer VM 1 10b- d:

closeness = i

j_; hops(i) x traffic{i)

Where "i" is the index of the peer VM 1 10b-d. Therefore, when considering the closeness of the host computers 108a and 108e-p that are not hosting peer VMs 1 10b-d to the host computers 108b-d that are hosting the peer VMs 1 10b-d, it can be seen that host 108a is the closest.

Figure 6 shows a datacenter network topology after the monitored VM 1 1 Oh has been scheduled to be hosted by the host 108a. It is clear from Figure 6 that fewer hops are required for the VM 1 10h to transmit data to and receive data from the peer VMs 1 10b- d. No traffic is required to be sent over the links between the edge switch 106c and the aggregation switch 104b and no traffic is sent over the links between the aggregation switches 104a-b and the layer 2 switch 102a.

In exemplary methods and apparatus, candidate hosts are determined initially. The candidate hosts comprise hosts with enough computational resource and memory to run VM 1 1 Oh. Hosts that have less network capacity than is needed to run VM 1 1 Oh are not part of the list of candidate hosts, to determine a reduced list of candidate hosts. The amount of network capacity needed to run VM 1 1 Oh may be determined from the network traffic profile for the VM 1 10h. For each of the hosts 108a-p on the reduced list, the closeness factor is determined, as set out above. A host 108a-p is determined in which the VM 1 10h is to be hosted, based on the determined closeness factors. In exemplary methods and apparatus, the host 108a-p with highest closeness factor is selected, thus grouping together VMs that communicate a lot with each other.

Figure 7 shows a flow chart for a method for scheduling a VM to a host computer. The receiver 304 of the scheduling node 300 receives 700 the network traffic profile for the VM 1 10h. A set of candidate hosts is selected 701 by the scheduler 310 of the scheduling node 300. This may be done using a number of hosts, say ten, and the list of candidates may be determined based on the likely closeness of hosts to the peer VMs 1 10b-d. For example, if a host shares an edge switch 106a-h, an aggregation switch 104a-d or a layer 2 switch 102a-b with one or more of the peer VMs 1 10b-d. In exemplary methods and apparatus, the candidate list may be determined to include only hardware accelerated hosts. For illustrative purposes, the candidate hosts are considered to be 108a, e-g.

The scheduler 310 determines 702 whether the first host 108a of the candidate hosts 108a, e-g has sufficient network capacity to host the VM 1 1 Oh. This may be done based on the received network traffic profile. If no, the candidate host 108a is removed 704 from the list of candidate hosts 108a, e-g. The scheduler 310 then determines 706 whether any more candidate hosts remain to be assessed for network capacity. If yes, steps 702-706 are repeated for the next host 108e. If no, then all candidate hosts have been assessed for network capacity and a reduced list of candidate hosts has been determined. For illustrative purposes, the reduced list is considered to be the same as the original list. That is, it is considered that all candidate hosts 108a, e-g have sufficient network capacity.

The scheduler 310 determines 708 the closeness of the candidate hosts 108a, e-g on the reduced list, as set out above. The scheduler determines 710 the optimal host as the host on the reduced list having the highest closeness factor. For illustrative purposes, the optimal host is considered to be 108a. The transmitter 302 transmits 712 instructions for the determined host 108a to host the monitored VM 1 1 Oh as part of a VM migration process.

Figure 8 shows a flow diagram for the profiling and scheduling of a VM in a host computer of a datacenter. A monitor 210 of a host 108a-p monitors 800 network traffic associated with the VM 1 10h, as set out above. The profiler 212 of the host 108a-p determines 802 a network traffic profile for the monitored VM 1 1 Oh based on the monitored network traffic, as set out above. The transmitter 202 of the host 108 transmits 804 the network traffic profile to the scheduling node 300.

The receiver 304 of the scheduling node 300 receives 806 the network traffic profile for the VM 1 1 Oh. A set of candidate hosts is selected 808 by the scheduler 310 of the scheduling node 300, as set out above. The scheduler 310 determines 810 whether the first host 108a of the candidate hosts 108a, e-g has sufficient network capacity to host the VM 1 1 Oh, as set out above. If no, the candidate host 108a is removed 812 from the list of candidate hosts 108a, e-g. The scheduler 310 then determines 814 whether any more candidate hosts remain to be assessed for network capacity. If yes, steps 810-814 are repeated for the next host 108e. If no, then all candidate hosts have been assessed for network capacity and a reduced list of candidate hosts has been determined. As above, or illustrative purposes, the reduced list is considered to be the same as the original list. The scheduler 310 determines 816 the closeness of the candidate hosts 108a, e-g on the reduced list, as set out above. The scheduler determines 818 the optimal host as the host on the reduced list having the highest closeness factor. As above, for illustrative purposes, the optimal host is considered to be 108a. The transmitter 302 transmits 820 instructions for the determined host 108a to host the monitored VM 1 10h.

In exemplary methods and apparatus, based on the information in the network traffic profile, VMs 1 10a-p may be classified into two types.

1 . Signalling VMs: these are virtual machines hosting signalling applications, where the volume of traffic sent/received is low

2. Payload VMs: these are virtual machines hosting payload applications with high traffic volume and high throughput

Type 1 VMs have typically low traffic volume and low throughput and may communicate with a large set of peer VMs, depending on the application type. Type 2 VMs have typically high traffic volume and high throughput and typically communicate with a large set of peer VMs, either intra or inter-datacentre. The scheduler 310 may be configured to treat VMs of type 1 and type 2 differently. Type 1 VMs may be configured to be scheduled to any of the closest available host 108a-p having sufficient network capacity. Type 2 VMs may be configured to be scheduled to the closest available hardware accelerated host 108a-p having sufficient network capacity. Therefore, for Type 2 VMs, the scheduler 310 may be configured to determine a candidate host list comprising only hardware accelerated VMs.

The scheduler may be configured to use further profiling information provided by the profiler. For example, the scheduler may weight the closeness factor based on traffic priority settings thus placing the monitored VM closer to a peer VM whose traffic has higher priority settings. Other methods and apparatus include weighting the closeness factor with the RTT, thus improving perceived delay for the worst RTT results.

Exemplary methods and apparatus may determine a network traffic profile for one or more VMs at intervals while the datacenter is in operation. Therefore, the datacenter may be configured to reschedule VMs on the fly according to whether the network traffic profile has changed.

Exemplary methods and apparatus disclosed herein propose a scheduler that is network aware through the use of heuristics, taking into account the networking needs of each VM. Existing solutions focus on compute, memory and storage requirements only. Instead of demanding the owner of the VM to determine a priori the network needs, methods and apparatus disclosed herein may automatically detect the network traffic profiles of VMs to determine the optimal hosts for VMs. Methods and apparatus disclosed herein may adapt to the communication patterns of VMs by dynamically rescheduling VMs to hosts. This may optimise VM placement according to the peer VMs it communicates with. In the end, methods and apparatus disclosed herein may create conglomerates of VMs that talk intensively to each other. A computer program may be configured to provide any of the above described methods. The computer program may be provided on a computer readable medium. The computer program may be a computer program product. The product may comprise a non-transitory computer usable storage medium. The computer program product may have computer-readable program code embodied in the medium configured to perform the method. The computer program product may be configured to cause at least one processor to perform some or all of the method.

Various methods and apparatus are described herein with reference to block diagrams or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits. These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).

Computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer- readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks. A tangible, non-transitory computer-readable medium may include an electronic, magnetic, optical, electromagnetic, or semiconductor data storage system, apparatus, or device. More specific examples of the computer-readable medium would include the following: a portable computer diskette, a random access memory (RAM) circuit, a read-only memory (ROM) circuit, an erasable programmable read-only memory (EPROM or Flash memory) circuit, a portable compact disc read-only memory (CD- ROM), and a portable digital video disc read-only memory (DVD/Blu-ray).

The computer program instructions may also be loaded onto a computer and/or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer and/or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. Accordingly, the invention may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.) that runs on a processor, which may collectively be referred to as "circuitry," "a module" or variants thereof.

It should also be noted that in some alternate implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Moreover, the functionality of a given block of the flowcharts and/or block diagrams may be separated into multiple blocks and/or the functionality of two or more blocks of the flowcharts and/or block diagrams may be at least partially integrated. Finally, other blocks may be added/inserted between the blocks that are illustrated.

The skilled person will be able to envisage other embodiments without departing from the scope of the appended claims.

Claims

CLAIMS:

1 . A scheduling node (300) for scheduling a virtual machine (1 1 Oh) to a host computer (108a) in a datacenter, the scheduling node comprising:

a receiver (304) configured to receive a network traffic profile for the virtual machine, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates;

a scheduler (310) configured to determine a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile, the closeness factor comprising the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines, wherein the scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor; and

a transmitter (302) configured to transmit instructions for the determined host computer to host the virtual machine.

2. A scheduling node (300) according to claim 1 , wherein the closeness factor for the one or more candidate host computers (108a, e-g) comprises the inverse of a sum of a number of hops required to transmit data from a candidate host computer to each of a plurality of peer virtual machines (1 10b-d).

3. A scheduling node (300) according to claim 1 or 2, wherein the network traffic profile further comprises an amount of network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d), and wherein closeness factor for the one or more candidate host computers (108a, e-g) is weighted based on one or more of the amount of network traffic between the virtual machine (1 10h) and the one or more peer virtual machines (1 10b-d); a priority associated with network traffic between the virtual machine (1 1 Oh) and the one or more peer virtual machines (1 10b-d); and a round trip time for network traffic between the virtual machine (1 1 Oh) and the one or more peer virtual machines (1 10b-d).

4. A scheduling node (300) according to any preceding claim, wherein the scheduler (310) is configured to determine the closeness factor for each of a plurality of candidate host computers (108a, e-g), and wherein the selected host computer (108a) is selected from the plurality of candidate host computers based at least in part on the closeness factor.

5. A scheduling node (300) according to any preceding claim, wherein each of the one or more candidate host computers (108a, e-g) comprises sufficient available resources to host the virtual machine (1 10h).

6. A scheduling node (300) according to any preceding claim, wherein one or more of the candidate host computers (108a, e-g) is a hardware accelerated host computer.

7. A scheduling node (300) according to claim 6, wherein the scheduler (310) is configured to determine a demand of traffic throughput for a the virtual machine (1 10h), and to determine that the virtual machine is to be hosted in a hardware accelerated host computer if the determined demand for traffic throughput is above a threshold value.

8. A method for scheduling a virtual machine (1 1 Oh) to a host computer (108a) in a datacenter, the method comprising:

receiving (700), by a receiver (304) of a scheduling network node (300), a network traffic profile for the virtual machine, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates;

determining (710), by a scheduler (310), a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile, the closeness factor comprising the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines, wherein the scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor; and

transmitting (712), by a transmitter (302), instructions for the determined host computer to host the virtual machine.

9. A non-transitory computer readable medium (309) comprising computer readable code configured, when read and executed by a computer, to carry out the method according to claim 8.

10. A host computer (108a-p) for determining a network traffic profile of a virtual machine (1 10h) in a datacenter, the host computer comprising:

a monitor (210) configured to monitor network traffic sent to and/or received from the virtual machine;

a profiler (212) configured to determine a network traffic profile for the virtual machine based on the monitored network traffic, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates; and

a transmitter (202) configured to transmit the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine.

1 1 . A host computer (108) according to claim 10, wherein the network traffic profile comprises an amount of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).

12. A host computer (108) according to claim 10 or 1 1 , wherein the network traffic profile comprises a rate of transmission and/or reception of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).

13. A host computer (108) according to claim 10 or 1 1 , wherein the network traffic profile comprises an indication of a round trip time for the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).

14. A host computer (108) according to any of claims 10 to 13, wherein the network traffic profile comprises data indicating a burstiness of the monitored network traffic sent to and/or received from the one or more peer virtual machines (1 10b-d).

15. A host computer (108) according to any of claims 10 to 14, wherein the network traffic profile comprises data indicating a priority associated with monitored network traffic.

16. A method for determining a network traffic profile of a virtual machine (1 10h) in a datacenter, the method comprising:

monitoring (400), by a monitor (210), network traffic sent to and/or received from the virtual machine; determining (402), by a profiler (212), a network traffic profile for the virtual machine based on the monitored network traffic, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates; and

transmitting (404), by a transmitter (202), the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine.

17. A non-transitory computer readable medium (209) comprising computer readable code configured, when read and executed by a computer, to carry out the method according to claim 16.

18. A system for profiling a virtual machine (1 1 Oh) and scheduling the virtual machine to a selected host computer (108a) in a datacenter, the system comprising: a host computer (108a-p) comprising a monitor (210) configured to monitor network traffic sent to and/or received from the virtual machine, a profiler (212) configured to determine a network traffic profile for the virtual machine based on the monitored network traffic, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates, and a transmitter (202) configured to transmit the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine;

the scheduling node comprising a receiver (304) configured to receive a network traffic profile for the virtual machine, a scheduler (310) configured to determine a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile, the closeness factor comprising the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines, wherein the scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor, and a transmitter (302) configured to transmit instructions for the determined host computer to host the virtual machine.

19. A method for profiling a virtual machine (1 1 Oh) and scheduling the virtual machine to a host computer (108a) in a datacenter, the method comprising:

monitoring (800), by a monitor (210) of host computer (108a-p), network traffic sent to and/or received from the virtual machine;

determining (802), by a profiler (212) of the host computer, a network traffic profile for the virtual machine based on the monitored network traffic, wherein the network traffic profile comprises an address of one or more peer virtual machines (1 10b-d) with which the virtual machine communicates;

transmitting (804), by a transmitter (202) of the host computer, the determined profile to a scheduling node (300) within the datacenter for scheduling the virtual machine;

receiving (806), by a receiver (304) of the scheduling node, a network traffic profile for a virtual machine;

determining (818), by a scheduler (310) of the scheduling node, a closeness factor for one or more candidate host computers (108a, e-g), based on the network traffic profile, the closeness factor comprising the inverse of a number of hops required to transmit data from a candidate host computer to one or more of the peer virtual machines, wherein the scheduler is further configured to select a host computer (108a) in which the virtual machine is to be hosted, based on the closeness factor; and

transmitting (820), by a transmitter (302) of the scheduling node, instructions to the determined host computer to host the virtual machine.

20. A non-transitory computer readable medium comprising computer readable code configured, when read and executed by a computer, to carry out the method according to claim 19.