Spot Virtual Machine Eviction Prediction in Microsoft Cloud
- Fangkai Yang ,
- Bowen Pang ,
- Jue Zhang ,
- Bo Qiao ,
- Lu Wang ,
- Camille Couturier ,
- Chetan Bansal ,
- Soumya Ram ,
- Si Qin ,
- Zhen Ma ,
- Íñigo Goiri ,
- Eli Cortez ,
- Senthil Baladhandayutham ,
- Victor Rühle ,
- Saravan Rajmohan ,
- Qingwei Lin 林庆维 ,
- Dongmei Zhang
TheWebConf 2022 |
Azure Spot Virtual Machines (Spot VMs) utilize unused compute capacity at significant cost savings. They can be evicted when Azure needs the capacity back, therefore suitable for workloads that can tolerate interruptions. A good prediction of Spot VM evictions is beneficial for Azure to optimize capacity utilization and offers users information to better plan Spot VM deployments by selecting clusters to reduce potential evictions. The current in-service cluster-level prediction method ignores the node heterogeneity by aggregating node information. In this paper, we propose a spatial-temporal node-level Spot VM eviction prediction model to capture the inter-node relations and time dependency. The experiments with Azure data show that our node-level eviction prediction model performs better than the node-level and cluster-level baselines.