The Speech Research Team is part of the Azure AI Cognitive Services Research (CSR) group and is responsible for fundamental advances in audio, speech, and spoken language processing technologies. We also work closely with engineering and product teams to bring the new technologies into Microsoft products.
We work on a wide range of speech processing problems, including speech enhancement, speech recognition, speaker diarization, multi-lingual speech recognition, spoken language understanding, end-to-end modeling and self-supervised learning. Our recent work covers the following topics.
- Deep learning-based real-time speech enhancement
- Monaural and multi-channel speech separation for meeting transcription
- Ad hoc microphone arrays
- End-to-end modeling for speaker-attributed speech recognition
- Unified speech representation learning
- Speech-language pre-training
The results of our work are delivered to Microsoft speech technologies and interwoven into various products. We also contributed to the development of new services, such as Conversation Transcription of Azure Cognitive Services which is powering the transcription features of several Microsoft products. Our work resulted in the first place in the speaker diarization track of VoxSRC-20 (joint work with other Microsoft researchers) and the breakthrough human parity performance on the Switchboard conversational speech recognition task.
The former Speech and Dialog Research Group (SDRG) was merged with the Azure Computer Vision Group in 2020 to form the Cognitive Services Research Group.