A Cloud-Scale Acceleration Architecture

Adrian Caulfield; Eric Chung; Andrew Putnam; Hari Angepat; Jeremy Fowers; Michael Haselman; Stephen Heil; Matt Humphrey; Puneet Kaur; Joo-Young Kim; Daniel Lo; Todd Massengill; Kalin Ovtcharov; Michael Papamichael; Lisa Woods; Sitaram Lanka; Derek Chiou; Doug Burger

A Cloud-Scale Acceleration Architecture

Adrian Caulfield ,
Eric Chung ,
Andrew Putnam ,
Hari Angepat ,
Jeremy Fowers ,
Michael Haselman ,
Stephen Heil ,
Matt Humphrey ,
Puneet Kaur ,
Joo-Young Kim ,
Daniel Lo ,
Todd Massengill ,
Kalin Ovtcharov ,
Michael Papamichael ,
Lisa Woods ,
Sitaram Lanka ,
Derek Chiou ,
Doug Burger

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture | October 2016

Published by IEEE Computer Society

Download BibTex

Hy The Cataputl Gen2 Card showing FPGA and Network ports enabling the Configurable Cloud perscale datacenter providers have struggled to balance the growing need for specialized hardware (efficiency) with the economic benefits of homogeneity (manageability). In this paper we propose a new cloud architecture that uses reconfigurable logic to accelerate both network plane functions and applications. This Configurable Cloud architecture places a layer of reconfigurable logic (FPGAs) between the network switches and the servers, enabling network flows to be programmably transformed at line rate, enabling acceleration of local applications running on the server, and enabling the FPGAs to communicate directly, at datacenter scale, to harvest remote FPGAs unused by their local servers.

We deployed this design over a production server bed, and show how it can be used for both service acceleration (Web search ranking) and Hardware and Software compute planes in the Configurable Cloud network acceleration (encryption of data in transit at high-speeds). This architecture is much more scalable than prior work which used secondary rack-scale networks for inter-FPGA communication. By coupling to the network plane, direct FPGA-to-FPGA messages can be achieved at comparable latency to previous work, without the secondary network. Additionally, the scale of direct inter-FPGA messaging is much larger. The average round-trip latencies observed in our measurements among 24, 1000, and 250,000 machines are under 3, 9, and 20 microseconds, respectively. The Configurable Cloud architecture has been deployed at hyperscale in Microsoft’s production datacenters worldwide.