Keeper: Automated Testing and Fixing of Machine Learning Software

Chengcheng Wan; Shicheng Liu; Sophie Xie; Yuhan Liu; Henry Hoffmann; Michael Maire; Shan Lu

Keeper: Automated Testing and Fixing of Machine Learning Software

Chengcheng Wan ,
Shicheng Liu ,
Sophie Xie ,
Yuhan Liu ,
Henry Hoffmann ,
Michael Maire ,
Shan Lu

ACM Transactions on Software Engineering and Methodology | May 2024

Download BibTex

The increasing number of software applications incorporating machine learning (ML) solutions has led to the need for testing techniques. However, testing ML software requires tremendous human effort to design realistic and relevant test inputs, and to judge software output correctness according to human common sense. Even when misbehavior is exposed, it is often unclear whether the defect is inside ML API or the surrounding code, and how to fix the implementation. This article tackles these challenges by proposing Keeper, an automated testing and fixing tool for ML software.

The core idea of Keeper is designing pseudo-inverse functions that semantically reverse the corresponding ML task in an empirical way and proxy common human judgment of real-world data. It incorporates these functions into a symbolic execution engine to generate tests. Keeper also detects code smells that degrade software performance. Once misbehavior is exposed, Keeper attempts to change how ML APIs are used to alleviate the misbehavior.

Our evaluation on a variety of applications shows that Keeper greatly improves branch coverage, while identifying 74 previously unknown failures and 19 code smells from 56 out of 104 applications. Our user studies show that 78% of end-users and 95% of developers agree with Keeper’s detection and fixing results.