Testing Machine Learning Systems in Industry: An Empirical Study

ICSE 2022 |

Machine learning (ML) becomes increasingly prevalent, being integrated into a wide range of software systems. These systems, named ML systems, must be adequately tested to gain confidence that they behave correctly. Although many research efforts have been devoted to testing technologies for ML systems, the industrial teams are faced with some new challenges of testing the ML systems in real-world settings. To absorb inspirations from the industry on the problems in ML testing, we conducted an empirical study including a survey with 87 responses and interviews with 7 senior practitioners working on different ML systems from well-known IT companies. Our study uncovers significant industrial concerns on major testing activities, i.e., test data collection, test execution, and test result analysis, and also some good practices and open challenges from the perspective of the industry. \textbf{(1) Test data collection} is conducted in different ways on ML model, data, and code and faced with different challenges. \textbf{(2) Test execution} in ML systems suffers from two major problems: entanglement among the components and the regression on model performance. \textbf{(3) Test result analysis} centers on quantitative methods, e.g., metric-based evaluation, and is also combined with some qualitative methods based on practitioners’ experience. Based on our findings, we highlight the research opportunities and also provide some implications for practitioners.