De-Aliasing States In Dialogue Modelling With Inverse Reinforcement Learning

  • Layla El Asri ,
  • Adam Trischler ,
  • Geoff Gordon

Speaking Machines

End-to-end dialogue response generation models learn dialogue state tracking, dialogue management, and natural language generation at the same time and through the same training signal. These models scale better than traditional modular architectures as they do not require much annotation. Despite significant advances, these models, often built using Recurrent Neural Networks (RNNs), exhibit deficiencies such as repetition, inconsistency, and low task-completion rates. To understand some of these issues more deeply, this paper investigates the representations learned by RNNs trained on dialogue data.

We highlight the problem of state aliasing, which entails conflating two or more distinct states in the representation space. We show empirically that state aliasing often occurs when encoder-decoder RNNs are trained via maximum likelihood or policy gradients. We propose to augment the training signal with information about the future to force the latent representations of the RNNs to hold sufficient information for predicting the future. Specifically, we train encoder-decoder RNNs to predict both the next utterance as well as a feature vector that represents the expected dialogue future. We draw inspiration from the Structured-Classification Inverse Reinforcement Learning (SCIRL, Klein et al., 2012) algorithm to compute this feature vector. In experiments on a generated dataset of text-based games, the augmented training signal mitigates state aliasing and improves model performance significantly.