Project Karya header - hand holding a cellphone

Digital Labor: Project Karya

Rajasthani Hindi Speech Data

April 2021

This dataset consists of audio recordings of participants reading out stories in Rajasthani Hindi, one sentence at a time. We had 98 participants from Soda, Rajasthan. Each participant read 30 stories. In total, we have 426873 recordings in this dataset. We had roughly 58 male participants and 40 female participants.

Download Dataset

Odia Speech Data and Model

October 2021

As part of this release, Navana Tech and Microsoft Research India are open-sourcing 1,648 hours of validated Odia speech dataset and a baseline model for Odia speech recognition. The speech dataset consists of recordings in Agriculture, Banking, and Healthcare in four dialects of Odia collected from five different districts. Please read the README.md file for more details.

*Note that the dataset download link cannot be used directly in a browser

Download link

How do I use a download link for an entire dataset?

A download link for an entire dataset provides the location of the dataset in Azure as well as a special time-limited key that allows you to download the entire dataset. Copy the button link above and use it with tools that can copy files from Azure, like the following:

AzCopy (opens in new tab) – a command-line tool for Windows or Linux that copies files to and from Azure.
Azure Storage Explorer (opens in new tab) – a utility that is used to manage Azure storage.