A Study on the Lifecycle of Flaky Tests

ICSE 2020 |

During regression testing, developers rely on the pass or fail outcomes of tests to check whether changes broke existing functionality. Thus, flaky tests, which nondeterministically pass or fail on the same code, are problematic because they provide misleading signals during regression testing. Although flaky tests are the focus of several existing studies, none of them study (1) the reoccurrence, runtimes, and time-before-fix of flaky tests, and (2) flaky tests indepth on proprietary projects. This paper fills this knowledge gap about flaky tests and investigates whether prior categorization work on flaky tests also apply to proprietary projects. Specifically, we study the lifecycle of flaky tests in six large-scale proprietary projects at Microsoft. We find, as in prior work, that asynchronous calls are the leading cause of flaky tests in these Microsoft projects. Therefore, we propose the first automated solution, called Flakiness and Time Balancer (FaTB), to reduce the frequency of flaky-test failures caused by asynchronous calls. Our evaluation of five such flaky tests shows that FaTB can reduce the running times of these tests by up to 78% without empirically affecting the frequency of their flaky-test failures. Lastly, our study finds several cases where developers claim they “fixed” a flaky test but our empirical experiments show that their changes do not fix or reduce these tests’ frequency of flaky-test failures. Future studies should be more cautious when basing their results on changes that developers claim to be “fixes”.