Conversations Powered by Cross-Lingual Knowledge

  • Weiwei Sun ,
  • Chuan Meng ,
  • Qi Meng ,
  • Zhaochun Ren ,
  • Pengjie Ren ,
  • Zhumin Chen ,
  • Maarten de Rijke

SIGIR 2021 |

ess of generated responses by leveraging external knowledge. Most of the existing approaches work only for scenarios with a massive amount of monolingual knowledge sources. For languages with limited availability of knowledge sources, it is not effective to use knowledge in the same language to generate informative responses. To address this problem, we propose the task of cross-lingual knowledge grounded conversation (CKGC), where we leverage large-scale knowledge sources in another language to generate informative responses. Two main challenges come with the task of cross-lingual knowledge grounded conversation: (1) knowledge selection and response generation in a cross-lingual setting; and (2) the lack of a test dataset for evaluation.

To tackle the first challenge, we propose the curriculum self-knowledge distillation (CSKD) scheme, which utilizes a large-scale dialogue corpus in an auxiliary language to improve cross-lingual knowledge selection and knowledge expression in the target language via knowledge distillation. To tackle the second challenge, we collect a cross-lingual knowledge grounded conversation test dataset to facilitate relevant research in the future. Extensive experiments on the newly created dataset verify the effectiveness of our proposed curriculum self-knowledge distillation method for cross-lingual knowledge grounded conversation. In addition, we find that our proposed unsupervised method significantly outperforms the state-of-the-art baselines in cross-lingual knowledge selection.