On Domain Transfer When Predicting Intent in Text

NeurIPS Workshop on Document Intelligence |

In many domains, especially enterprise text analysis, there is an abundance of data which can be used for the development of new AI-powered intelligent experiences to improve people’s productivity. However, there are strong-guarantees of privacy which prevent broad sampling and labeling of personal text data to learn or evaluate models of interest. Fortunately, in some cases like enterprise email, manual annotation is possible on certain public datasets. The hope is that models trained on these public datasets would perform well on the target private datasets of interest. In this paper, we study the challenges of transferring information from one email dataset to another, for predicting user intent. In particular, we present approaches to characterizing the transfer gap in text corpora from both an intrinsic and extrinsic point-of-view, and evaluate several proposed methods in the literature for bridging this gap. We conclude with raising issues for further discussion in this arena.