System Design for Cloud Services

Speaker: Thomas Wenisch, University of Michigan

Online Data Intensive (OLDI) applications, which process terabytes of data with sub-second latencies, are the cornerstone of modern internet services. In this talk, I discuss two system design challenges that make it very difficult to build efficient OLDI applications. (1) Killer Microseconds—today’s CPUs are highly effective at hiding the nanosecond-scale latency of memory accesses and operating systems are highly effective at hiding the millisecond-scale latency of disks. However, modern high-performance networking and flash I/O frequently lead to situations where data are a few microseconds away. Neither hardware nor software offer effective mechanisms to hide microsecond-scale stalls. (2) The Tail at Scale—OLDI services typically rely on sharding data over hundreds of servers to meet latency objectives. However, this strategy mandates waiting for responses from the slowest straggler among these servers. As a result, exceedingly rare events, which have negligible impact on the throughput of a single sever, nevertheless come to dominate the latency distribution of the OLDI service. At 1000-node scale, the 5th ‘9 of the individual server’s latency distribution becomes the 99% latency tail of the entire request. These two challenges cause OLDI operators to execute their workloads inefficiently at low utilization to avoid compounding stalls and tails with queueing delays. There is a pressing need for systems researchers to find ways to hide microsecond-scale stalls and track down and address the rare triggers of 99.999% tail performance anomalies that destroy application-level latency objectives.

Speaker: Boris Grot, University of Edinburgh

Web-scale online services mandate fast access to massive quantities of data. In practice, this is accomplished by sharding the datasets across a pool of servers within a datacenter and keeping each shard in the servers’ main memory to avoid long-latency disk I/O. Accesses to non-local shards take place over the datacenter network, incurring communication delays that are 20-1000x greater than accesses to local memory. In this talk, I will introduce Scale-Out NUMA — a rack-scale architecture with an RDMA-inspired programming model that eliminates chief latency overheads of existing networking technologies and reduces the remote memory access latency to a small factor of local DRAM.

Speaker: Douglas Carmean, Microsoft

Traditional technology scaling trends have slowed, motivating many to proclaim the end of Moore’s law and the end of CMOS process technology. While the alarmists predict a cataclysmic end to computer systems, as we know them, an evolution to new technologies is more likely. This talk will explore the possibilities of hybrid computing systems that may incorporate quantum, cryogenic and DNA components.

Speakers: Sherief Reda, Brown University; Lingjia Tang, University of Michigan-Ann Arbor

Power management is a central issue in large-scale computing clusters where a considerable amount of energy is consumed at the expense of a large operational cost. Traditional power management techniques have a centralized design that creates challenges for scalability of computing clusters. We describe a novel framework, DiBA, that achieves optimal power management in a fully decentralized manner. DiBA is a consensus-based algorithm in which each server determines its optimal power consumption locally by communicating its state with neighbors in a cluster until consensus is achieved. We demonstrate the superiority of DiBA using a real cluster and computer simulations.

Speaker: Lingjia Tang, University of Michigan-Ann Arbor

As user demand scales for intelligent personal assistants (IPAs) such as Apple’s Siri, Google’s Google Now, and Microsoft’s Cortana, we are approaching the computational limits of current data center architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications. In this talk, I present the design of Sirius Lucida, an open end-to-end IPA web-service application that accepts queries in the form of voice and images, and responds with natural language. I will then discuss the implications of this type of workload on future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs.

Speaker: Sameh Elnikety, Microsoft Research

Sharing physical resources among independent large-scale applications allows better resource utilization and therefore reduces costs. However, if not done carefully, sharing becomes dangerous: it degrades the responsiveness of interactive services, and batch workloads do more work taking longer time to complete. In this talk, I will describe some of technical problems that face Microsoft services and platforms such as in Bing Search (internal services) and Azure (external services). I will highlight some of the solutions in particular for latency-sensitive applications and their experimental results. Finally, I will discuss a few subtle problems that arise due to resource competition in large applications.

Speaker: Benjamin Lee, Duke University

Datacenter software should share hardware to improve energy efficiency and mitigate energy disproportionality. However, management policies for shared hardware determine whether strategic users participate in consolidated systems. Given strategic behavior, we incentivize participation with mechanisms rooted in algorithmic game theory. First, Resource Elasticity Fairness allocates multiprocessor resources and guarantees sharing incentives, envy-freeness, Pareto efficiency, and strategy-proofness. Second, Repeated Allocation Games allocate heterogeneous processors and guarantee fairness over time. Finally, Computational Sprinting Games allocate performance boosts in datacenters with shared and oversubscribed power supplies, producing an efficient equilibrium. With game theory, we formalize strategic resource competition in shared computer systems.

Speaker: Adam Wierman, California Institute of Technology

Data is broadly being gathered, bought, and sold in a variety of marketplaces today; however, these markets are in their nascent stages. Data is typically obtained through offline negotiations, but online, dynamic cloud data markets are beginning to emerge. As they do, challenging questions related to pricing and privacy are surfacing. This talk will overview some challenges in this regard and describe a novel perspective related to privacy: privacy is not just in the best interest of the consumer; it actually provides a crucial tool for the data seller as well—one that allows a principled approach for versioning.

Speaker: Doug Burger, Microsoft Research

The cloud will fundamentally change our industry and field. The market is consolidating on a small number of vendors who are building out massive, global, hyperscale computers. In this short talk I will lay out some principles for the trends that I believe will affect the architecture of these new, worldwide computers.

Speakers: Ion Stoica, University of California-Berkeley; Karin Strauss, Microsoft Research; Marc Tremblay, Microsoft

New memory technologies promise denser and cheaper main memory, and may one day displace DRAM. However, many of them experience permanent failures due to wear far more quickly than DRAM. DRAM mechanisms that handle permanent failures rely on very low failure rates and, if directly applied to this new failure model, are extremely inefficient. In this talk, I will discuss our recent work on tolerating wear failures and reducing associated waste by leveraging a managed runtime to abstract away memory layout and work around failures.

Speaker: Ion Stoica, University of California-Berkeley

To fully realize the value of data, we need the ability to respond and act on the latest data in real-time at global scale, while preserving user privacy and ensuring application security. In this talk I’ll outline the challenges and the research opportunities of real-time automated decision making, and present our plans at Berkeley to tackle these challenges. These efforts are part of the new UC Berkeley RISE (Real-time Intelligent Secure Execution) lab.

Speaker: Marc Tremblay, Microsoft

This talk will cover how the co-design of devices from the silicon, system and software standpoint, in the context of a fully integrated design team, applies to optimizing hyper-scale datacenters running internal cloud workloads as well as hundreds of thousands of customer workloads running on virtual machines. Simulation results based on these workloads and other benchmarks are presented to improve our understanding of the impact of such technology as large L4 caches and/or high-bandwidth memory.

System Design for Cloud Services

Killer Microseconds and the Tail at Scale

Turbocharging Rack-Scale In-Memory Computing with Scale-Out NUMA

Promising Computing Future Beyond the Limits of CMOS Technology

Optimal Decentralized Power Management for Large-Scale Computing Clusters

Intelligent Personal Assistant and its Implication on Future Warehouse Scale Computers

The Art of Sharing Resources Transparently

Rethinking Systems Management with Game Theory

Data Markets in the Cloud: Pricing, Privacy, and Versioning

How to Think about Hyperscale Architecture

Tolerating Holes in Wearable Memories

Real-time, Intelligent, and Secure Systems for Automated Decision Making

Codesign: from Devices to Hyperscale Datacenters