Neural Machine Translation of Spoken-Dialects

Spoken language translation is usually limited by the non-availability of the parallel data. We generate synthetic data for Neural Machine Translation of Spoken-Dialects. We introduce a novel approach to generate synthetic data for training Neural Machine Translation systems. The proposed approach transforms a given parallel corpus between a written language and a target language to a parallel corpus between a spoken dialect variant and the target language. This is work is shipped in Microsoft Translator (opens in new tab) for some languages and described in this paper Synthetic Data for Neural Machine Translation of Spoken-Dialects (opens in new tab)