information and data sciences

Intelligible, Interpretable, and Transparent Machine Learning

The importance of intelligibility and transparency in machine learning

Most real datasets have hidden biases. Being able to detect the impact of the bias in the data on the model, and then to repair the model, is critical if we are going to deploy machine learning in applications that affect people’s health, welfare, and social opportunities. This requires models that are intelligible.

In machine learning, there is often a tradeoff between accuracy and intelligibility: the most accurate machine learning models usually are not very intelligible (for example, deep neural nets, boosted trees, random forests, and support vector machines), and the most intelligible models usually are less accurate (for example, linear or logistic regression). This tradeoff often limits the accuracy of models that can be applied in mission-critical applications such as healthcare, where being able to understand, validate, edit, and ultimately trust a learned model is important.

At Microsoft, we have developed a learning method based on generalized additive models that is as accurate as full complexity models such as random forests, but which remains as intelligible as—and in some cases is even more intelligible than—models such linear and logistic regression. We’ve done this by applying modern machine learning methods and computational horsepower to the problems of training accurate generalized additive models and modeling important pairwise interactions. For many datasets, the new learning method is just as accurate as any other, but far more intelligible.

We’ve applied transparent learning to problems in healthcare, such as diabetes, pneumonia, and 30-day hospital readmission risk prediction. We’ve also applied the new method to important social problems such as recidivism prediction and credit scoring, where bias based on race, gender, and nationality are important issues to take into account.

In addition to making transparent what the models have learned, the new learning methods also make it easier to edit the models to remove bias or other errors that may have been introduced in the learning process. This is important because it is not enough to just know that a model has learned something inappropriate, one must also have a way of repairing the model to fix the issue.

People

Portrait of Rich Caruana

Rich Caruana

Senior Principal Researcher

Portrait of Paul Koch

Paul Koch

Principal Research Software Engineer

Portrait of Nick Craswell

Nick Craswell

Principal Architect

Portrait of Tom Finley

Tom Finley

Principal Software Engineer

Microsoft

Portrait of Harsha Nori

Harsha Nori

Director, Research Engineering