Contextual Bandit for Active Learning: Active Thompson Sampling

  • Djallel Bouneffouf ,
  • Romain Laroche ,
  • Tanguy Urvoy ,
  • Raphael Feraud ,
  • Robin Allesiardo

Proceedings of the 21st International Conference on Neural Information Processing (ICONIP) |

The labelling of training examples is a costly task in a supervised classification. Active learning strategies answer this problem by selecting the most useful unlabelled examples to train a predictive model. The choice of examples to label can be seen as a dilemma between the exploration and the exploitation over the data space representation. In this paper, a novel active learning strategy manages this compromise by modelling the active learning problem as a contextual bandit problem. We propose a sequential algorithm named Active Thompson Sampling (ATS), which, in each round, assigns a sampling distribution on the pool, samples one point from this distribution, and queries the oracle for this sample point label. Experimental comparison to previously proposed active learning algorithms show superior performance on a real application dataset.