CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings

  • Shinji Watanabe ,
  • Michael Mandel ,
  • Jon Barker ,
  • Emmanuel Vincent ,
  • Ashish Arora ,
  • Xuankai Chang ,
  • Sanjeev Khudanpur ,
  • Vimal Manohar ,
  • Daniel Povey ,
  • Desh Raj ,
  • David Snyder ,
  • Aswin Shanmugam Subramanian ,
  • Jan Trmal ,
  • Bar Ben Yair ,
  • Christoph Boeddeker ,
  • Zhaoheng Ni ,
  • Yusuke Fujita ,
  • Shota Horiguchi ,
  • ,
  • Takuya Yoshioka ,
  • Neville Ryant

arXiv:2004.09249 |

Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6). The new challenge revisits the previous CHiME-5 challenge and further considers the problem of distant multi-microphone conversational speech diarization and recognition in everyday home environments. Speech material is the same as the previous CHiME-5 recordings except for accurate array synchronization. The material was elicited using a dinner party scenario with efforts taken to capture data that is representative of natural conversational speech. This paper provides a baseline description of the CHiME-6 challenge for both segmented multispeaker speech recognition (Track 1) and unsegmented multispeaker speech recognition (Track 2). Of note, Track 2 is the first challenge activity in the community to tackle an unsegmented multispeaker speech recognition scenario with a complete set of reproducible open source baselines providing speech enhancement, speaker diarization, and speech recognition modules.