Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs
- Fangkai Yang ,
- Lu Wang ,
- Zhenyu Xu ,
- Jue Zhang ,
- Liqun Li ,
- Bo Qiao ,
- Camille Couturier ,
- Chetan Bansal ,
- Soumya Ram ,
- Si Qin ,
- Zhen Ma ,
- Íñigo Goiri ,
- Eli Cortez ,
- Terry Yang ,
- Victor Ruehle ,
- Saravan Rajmohan ,
- Qingwei Lin 林庆维 ,
- Dongmei Zhang
Organized by ACM
Cloud providers often have resources that are not being fully utilized, and they may offer them at a lower cost to make up for the reduced availability of these resources. However, customers may be hesitant to use such offerings (such as spot VMs) as making trade-offs between cost and resource availability is not always straightforward. In this work, we propose Snape (Spot On-demand Perfect Mixture), an intelligent framework to optimize the cost and resource availability by dynamically mixing on-demand VMs with spot VMs. Through a detailed characterization based on real production traces, we verify that the eviction of spot VMs is predictable to some extent. Snape also leverages constrained reinforcement learning to adjust the mixture policy online. Experiments across different configurations show that Snape achieves 44% savings compared to using only on-demand VMs while maintaining 99.96% availability, which is 2.77% higher than using only spot VMs.