Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text

  • Sunayana Sitaram ,
  • Sai Krishna Rallabandi ,
  • Shruti Rijhwani ,
  • Alan W Black

Speech Synthesis Workshop 9 |

Most Text to Speech (TTS) systems today assume that the input
is in a single language written in its native script, which is the
language that the TTS database is recorded in. However, due
to the rise in conversational data available from social media,
phenomena such as code-mixing, in which multiple languages
are used together in the same conversation or sentence are now
seen in text. TTS systems capable of synthesizing such text
need to be able to handle multiple languages at the same time,
and may also need to deal with noisy input. Previously, we
proposed a framework to synthesize code-mixed text by using
a TTS database in a single language, identifying the language
that each word was from, normalizing spellings of a language
written in a non-standardized script and mapping the phonetic
space of mixed language to the language that the TTS database
was recorded in. We extend this cross-lingual approach to more
language pairs, and improve upon our language identification
technique. We conduct listening tests to determine which of the
two languages being mixed should be used as the target language.
We perform experiments for code-mixed Hindi-English
and German-English and conduct listening tests with bilingual
speakers of these languages. From our subjective experiments
we find that listeners have a strong preference for cross-lingual
systems with Hindi as the target language for code-mixed Hindi
and English text. We also find that listeners prefer cross-lingual
systems in English that can synthesize German text for codemixed
German and English text.