Active learning for sparse Bayesian multi-label classification

ACM SIGKDD Conference on Knowledge Discovery and Data Mining, New York |

We study the problem of active learning for multilabel classification.
We focus on the real-world scenario where the
average number of positive (relevant) labels per data point
is small leading to positive label sparsity. Carrying out mutual
information based near-optimal active learning in this
setting is a challenging task since the computational complexity
involved is exponential in the total number of labels.
We propose a novel inference algorithm for the sparse
Bayesian multilabel model of [17]. The benefit of this alternate
inference scheme is that it enables a natural approximation
of the mutual information objective. We prove that
the approximation leads to an identical solution to the exact
optimization problem but at a fraction of the optimization
cost. This allows us to carry out efficient, non-myopic, and
near-optimal active learning for sparse multilabel classification.
Extensive experiments reveal the effectiveness of the
method.