June 17, 2018

Interspeech 2018 Special Session: Low Resource Speech Recognition Challenge for Indian Languages

Organizing Committee:

Kalika Bali
Krishna Doss Mohan
Rupesh Kumar Mehta
Niranjan Nayak
Sunayana Sitaram
Radhakrishnan Srikanth

Data preparation and baselines:

Brij Mohan Lal Srivastava
Pallavi Matani
Sandeepkumar Satpal
Satarupa Guha
Shambo Chatterjee
Swapnajeet Padhi

In keeping with the Interspeech 2018 theme of ‘Speech Research for Emerging Markets in Multilingual Societies’, we are organizing a special session and challenge on speech recognition for low resource languages. Most languages in the world lack the amount of text, speech and linguistic resources required to build large Deep Neural Network (DNN)-based models. However, there have been many advances in DNN architectures, cross-lingual and multilingual speech processing techniques, and approaches incorporating linguistic knowledge into machine-learning based models, that can help in building systems for low resource languages. In this challenge, we will focus on building Automatic Speech Recognition (ASR) systems for Indian languages with constraints on the data available for Acoustic Modeling and Language Modeling.

India has around 1500 languages, of which 22 languages have been given the status of official languages by the Government of India. According to the 2001 census, 29 Indian languages have more than a million speakers. Most of these languages, except for Hindi, are low resource. Many of these, do not have a written script and hence, speech technology solutions would greatly benefit such communities. To be able to truly support speech and language systems that can be used by everyone in the country, we need to come up with techniques to build systems in these resource constrained settings, while also exploiting the unique properties and similarities between Indian languages.

We are releasing data in Telugu, Tamil and Gujarati, and participants in this challenge will be required to use only the released data to build ASR systems in these languages, which will make the task fair for all participants and direct the focus of the work to the low resource setting. However, we will not restrict participants from only working on one of the components of the ASR pipeline – participants will be free to innovate in any aspect of the ASR system as long as they only use the data provided. We will release a baseline system that participants can compare their systems against and use as a starting point. During testing, we will release a held-out blind test set that the systems will be evaluated on.

Contact us: interspeech2018@microsoft.com