MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning

NAACL 2021 |

Published by NAACL 2021

The combination of multilingual pre-trained representations and cross-lingual transfer learning is one of the most effective methods for building functional NLP systems for low-resource languages. However, for extremely low-resource languages without large-scale monolingual corpora for pre-training or sufficient annotated data for fine-tuning — transfer learning remains an under-studied and challenging task. Moreover, recent work shows that multilingual representations are surprisingly disjoint across languages, bringing additional challenges for transfer onto extremely low-resource languages. In this paper, we propose MetaXL, a meta-learning based framework that learns to transform representations judiciously from the source language to a target one and brings their representation spaces closer for effective transfer. Extensive experiments on real-world low-resource languages — without access to large-scale monolingual corpora or large amounts of labeled data — for tasks like cross-lingual sentiment analysis and named entity recognition show the effectiveness of our approach. Code for MetaXL is publicly available at github.com/microsoft/MetaXL.

Publication Downloads

Meta Representation Transformation for Low-resource Cross-Lingual Learning [Code]

May 24, 2021

This is a source code release for a published research at NAACL 2021. Paper Title: MetaXL: Meta Representation Transformation for Low-resource Cross-Lingual Learning Paper Abstract: The combination of multilingual pre-trained representations and cross-lingual transfer learning is one of the most effective methods for building functional NLP systems for low resource languages. However, for extremely low-resource languages without large-scale monolingual corpora for pre-training or sufficient annotated data for fine-tuning –transfer learning remains an under-studied and challenging task. Moreover, recent work shows that multilingual representations are surprisingly disjoint across languages (Singh et al., 2019), bringing additional challenges for transfer onto extremely low-resource languages. In this paper, we propose MetaXL, a meta-learning based framework that learns to transform representations judiciously from the auxiliary languages to a target one and brings their representation spaces closer for effective transfer. Extensive experiments on real-world low-resource languages –without access to large-scale monolingual corpora or large amounts of labeled data –for tasks like cross-lingual sentiment analysis and named entity recognition show the effectiveness of our approach