Towards predictable datacenter networks

Authors: 
Hitesh Ballani, Paolo Costa, Thomas Karagiannis and Ant Rowstron
Published: 
SIGCOMM
Year: 
2011

This paper looks at the issue of reducing variability in performance in data centre networks. Variable network performance can lead to unreliable application performance in networked applications – this can be a particular problem for cloud apps. Virtual networks are proposed as a solution to isolate the “tenant” performance from the physical network infrastructure. The system presented is known as Oktopus. The system provides a tradeoff between guarantees to tenants, costs to tenants and profits to providers by mapping a virtual network to the physical network.

In the introduction a useful study of the intra-cloud network bandwidth is done by looking at results from past studies (figure 1). Large differences (factor of five or more in some studies) are observed.

Tenant requests are abstracted to simply a number <N> of virtual machines (assumed standard in terms of CPU memory and storage) and a bandwidth B, making a virtual switch with a total bandwidth N B but with a limit on each VM of B. In this case the subscription request is <N,B>. This is known as a “Virtual cluster”.

Some networks are oversubscribed – in these cases a request is described as <N,B,S,O> where S is the size of groups in which the N machines are clustered (N/S groups) and O is an oversubscription factor. VMs in group are connected by a virtual group switch of capacity B and groups are interconnected by a link of capacity B'=SB/O. This is known as a “Virtual oversubscribed cluster”.

Other topologies could be offered – for example a clique (which would have high max bandwidth but limit provider flexibility).

The Oktopus system (named after Paul the psychic octopus – hence German spelling) implements these virtual abstractions. Requesting tenants can ask for a virtual cluster or a virtual oversubscribed cluster for connections. A management plance (logically centralised) performs admission control and maps requests to physical machines. A data plane uses rate-limiting at end-hosts to enforce bandwidth limits.

The algorithm which maps requests onto physical networks focuses on tree-like physical topologies. Allocating virtual oversubscribed clusters is allocating virtual clusters of size <S,B> with links of size B' between them.

Tenants without virtual networks should be guaranteed a fair share of network bandwidth. Tenants on virtual networks are given high priority traffic with guaranteed but limited bandwidth. Other traffic gets fair share of residual bandwidth. Limiting the share of total bandwidth used by virtual networks guarantees a minimum QoS for non virtual network clients.

In non-tree networks then routing may be an issue. Load balancing is used to spread traffic across paths.

Evaluation is by simulation on a testbed consisting of 25 physical machines. A three-level tree topology is used and no path diversity. The simulation is of a datacentre with 16,000 physical machines and 4VMs each machine. The machines are arranges in 40 machines racks with a top of rack switch. These connect to aggregation switches which connect to a datacentre core switch.

The allocation algorithm for VMs is greedy (since the full problem is NP hard). If m out of N VMs are places on a single physical machine then N-m must be on other phyical machines and min((N-m),m)B bandwidth must be available. Assume K slots for VMs per machine. Each link capacity C. k_v is number of empty slots on v. R_l is residual bw for link l. The number of VMs for request for <N,B> that can be allocated on machine v with outbound link l is: M_v = { m in [0,min (k_v,N)] s.t. min(m,N-m)*B leq R_l}.

Simulations are run with tenant jobs modelled as independent tasks with flows between tasks on individual VMs of uniform length L. A simulated workflow makes some simple assumptions about length of task completion. The network is assumed to have more demand for bandwidth than it can handle. A baseline comparison is tenants asking only for <N> a number of machines which are greedily allocated as close as possible. This is extended as <N,B> and <N,B,S,O> to see the effects on completion times. Completion times are much reduced by placement using oversubscription policies as bandwidth is used more efficiently. In addition the efficiency would allow providers to operate clouds closer to 100% utilisation (Amazon apparently at 70 - 80%).

Show bibtex
Page generated 2014-01-07, by jemdoc.