The Speech Research Team is part of the Azure Cognitive Services Research (CSR) group and is responsible for fundamental advances in audio, speech, and spoken language processing technologies. We also work closely with engineering and product teams to bring the new technologies into Microsoft products.

We work on a wide range of speech processing problems, including speech enhancement, speech recognition, speaker diarization, multi-lingual speech recognition, spoken language understanding, end-to-end modeling, self-supervised learning, and multi-modal modeling. Our recent work covers the following topics.

  • Deep learning-based real-time speech enhancement
  • Monaural and multi-channel speech separation for meeting transcription
  • Ad hoc microphone arrays
  • End-to-end modeling for speaker-attributed speech recognition
  • Unified speech representation learning
  • Speech-language pre-training

The results of our work are delivered to Microsoft speech technologies and interwoven into various products. We also contributed to the development of new services, such as Conversation Transcription of Azure Cognitive Services which is powering the transcription features of several Microsoft products. We received the IEEE Signal Processing Society Conference Best Paper Award for Industry at ICASSP 2022. Our work resulted in the first place in the speaker diarization track of VoxSRC-20 (joint work with other Microsoft scientists and Microsoft Research researchers) and the breakthrough human parity performance on the Switchboard conversational speech recognition task