Safe Velocity: A Practical Guide to Software Deployment at Scale using Controlled Rollout
41st International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP '19), |
Software companies are increasingly adopting novel approaches to ensure their products perform correctly, succeed in improving user experience and assure quality. Two approaches that have significantly impacted product development are controlled experiments – concurrent experiments with different variations of the same product, and phased rollouts – deployments to smaller audiences (rings) before deploying broadly. Although powerful in isolation, product teams experience most benefits when the two approaches are integrated.
Intuitively, combining them may seem trivial. However, in practice and at a large scale, this is difficult. For example, it requires careful data analysis to correctly handle exposed populations, determine the duration of exposure, and identify the differences between the populations. All of these are needed to optimize the likelihood of successful deployments, maximize learnings, and minimize potential harm to users of the products.
In this paper, based on case study research at Microsoft, we introduce controlled rollout (CRL), which applies controlled experimentation to each ring of a traditional phased rollout. We describe its implementation on several products used by hundreds of millions of users along with the complexities encountered and overcome. In particular, we explain strategies for selecting the length of the rollout period and metrics of focus, and defining the pass criterion for each of the rings. Finally, we evaluate the effectiveness of CRL by examining hundreds of controlled rollouts at Microsoft Office. With our work, we hope to help other companies in optimizing their software deployment practices.