The Microsoft 2017 Conversational Speech Recognition System

  • ,
  • Lingfeng Wu ,
  • Jasha Droppo ,
  • Xuedong Huang ,
  • Andreas Stolcke

Proc. IEEE ICASSP |

Published by IEEE

We describe the latest version of Microsoft’s conversational speech recognition system for the Switchboard and CallHome domains.  The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring.  For system combination we adopt a two-stage approach, whereby acoustic model posteriors are first combined at the senone/frame level,followed by a word-level voting via confusion networks.  We also added another language model rescoring step following the confusion network combination.  The resulting system yields a 5.1% word error rate on the NIST 2000 Switchboard test set, and 9.8% on the CallHome subset.