Overview
Azure Research – Systems is a research group in Azure Core that brings forward-looking, world-class systems research directly into Azure. The group was seeded from the Cloud Efficiency team, which migrated from the Systems Research Group at Microsoft Research, for a closer integration with Azure.
Our group’s main mission is to improve the cost efficiency of Microsoft’s online services and datacenters. We pursue this mission by working closely with the company’s product groups to (1) propose and lead joint projects that improve efficiency, and (2) do research on potential future efficiency improvements.
Impact of our research
Some of our main tech transfers and corresponding papers
- The server shutdown component of our power emergency management system (described in our ISCA 2021 paper) went into production in June 2023. The system allows datacenters to allocate all of their reserve/redundant power and host more servers.
- Harvest VMs v2 for harvesting underutilized cores (described as Elastic VMs in our EuroSys 2021 paper) went into production in January 2023.
- The server throttling component of our power emergency management system (also described in our ISCA 2021 paper) went into production in March 2021.
- Our per-VM power capping software (described in our ATC 2021 paper) went into production in October 2020.
- Our hybrid policy for managing cold starts in serverless platforms (described in our ATC 2020 paper) went into production in Azure Functions in June 2020.
- Harvest VMs v1 for harvesting unallocated cores and Harvest Hadoop, our modification of YARN and HDFS to benefit from Harvest VMs, (described in our OSDI 2020 paper) went into production in November 2019.
- Our power capping and oversubscription software went into production in July 2018.
- Our tail latency mitigation techniques for HDFS (described in our EuroSys 2019 paper) went into production in June 2018.
- Resource Central, our ML and prediction-serving system for cloud platforms (described in our SOSP 2017 paper), went into production in March 2018.
- Router-Based HDFS Federation, our system for transparently scaling HDFS to datacenter sizes (described in our ATC 2017 paper), went into production in June 2017.
- CPU blind isolation for harvesting spare CPU cycles (described in our ATC 2018 paper (opens in new tab) with the DMX group at MSR) went into production in August 2016.
- Perflite, a tool for VM utilization analysis and optimization built from our Floodlight tool, went into production in February 2016.
- Our resource-harvesting YARN/HDFS stack and HDFS data placement algorithm for harvesting spare storage (described in our OSDI 2016 paper) went into production in January 2016.
- Our analysis of disk reliability (described in our FAST 2016 award paper) prompted the adoption of a new ambient control policy for Microsoft’s free-cooling datacenters starting in 2015.
None of these successes would not have been possible without our close partnership with teams in Azure, Bing, CO+I, AHSI, and Windows/Hyper-V.
Some of our recent best paper awards
- Pond: CXL-Based Memory Pooling Systems for Cloud Platforms. ASPLOS 2023. Huaicheng Li, Daniel S. Berger, Stanko Novakovic, Lisa Hsu, Dan Ernst, Pantea Zardoshti, Monish Shah, Samir Rajadnya, Scott Lee, Ishwar Agarwal, Mark D. Hill, Marcus Fontoura, Ricardo Bianchini. 🏅Distinguished Paper Award
- Ensō: A Streaming Interface for NIC-Application Communication. OSDI 2023. Hugo Sadok, Nirav Atre, Zhipeng Zhao, Daniel S. Berger, James C. Hoe, Aurojit Panda, Justine Sherry, Ren Wang. 🏅Best Paper Award.
- Overclocking in Immersion-Cooled Datacenters. 🏅IEEE Micro Top Picks from the 2021 Computer Architecture Conferences. Pulkit Misra, Ioannis Manousakis, Esha Choukse, Majid Jalili, Íñigo Goiri, Ashish Raniwala, Brijesh Warrier, Husam Alissa, Bharath Ramakrishnan, Phillip Tuma, Christian Belady, Marcus Fontoura, Ricardo Bianchini.
Complete list of publications here.