ODIN: Optimal Discovery of High-value INformation Using Model-based Deep Reinforcement Learning

Real-world Sequential Decision Making Workshop, ICML |

We consider the problem of active feature selection where we dynamically choose the set of features that acquires the highest predictive performance relative to a task. We propose a modelbased deep reinforcement learning framework for Optimal Discovery of high-value INformation (ODIN) in which the agent either chooses to ask for a new feature or to stop and predict. Utilizing the ability of the partial variational autoencoder (Ma et al., 2018) the framework models the conditional distribution of the features allowing for data efficiency. We introduce a novel cost function that is sensitive to both cost and order of feature acquisition. ODIN handles missing data naturally
and ensures the globally optimal solution for most efficient feature acquisition while preserving data efficiency. We show improved performance on both synthetic and real-life datasets.