Automatic Detection of Code-switching Style from Acoustics

Workshop on Computational Approaches to Linguistic Code Switching, 2018 |

Multilingual speakers switch between languages displaying inter sentential, intra sentential, and congruent lexicalization based transitions. While monolingual ASR systems may be capable of recognizing a few words from a foreign language, they are usually not robust enough to handle these varied styles of code-switching. There is also a lack of large code-switched speech corpora capturing all these styles making it difficult to build code-switched speech recognition systems. We hypothesize that it may be useful for an ASR system to be able to first detect the switching style of a particular utterance from acoustics, and then use specialized language models or other adaptation techniques for decoding the speech. In this paper, we look at the first problem of detecting code-switching style from acoustics. We classify code-switched SpanishEnglish and Hindi-English corpora using two metrics and show that features extracted from acoustics alone can distinguish between different kinds of codeswitching in these language pairs.