Large-Margin Minimum Classification Error Training for Large-Scale Speech Recognition Tasks

  • Dong Yu ,
  • Li Deng ,
  • Xiaodong He ,
  • Alex Acero

Proceedings of the ICASSP, Honolulu, Hawaii |

Published by IEEE

Recently, we have developed a novel discriminative training method named large-margin minimum classification error (LM-MCE) training that incorporates the idea of discriminative margin into the conventional minimum classification error (MCE) training method. In our previous work, this novel approach was formulated specifically for the MCE training using the sigmoid loss function and its effectiveness was demonstrated on the TIDIGITS task alone. In this paper two additional contributions are made. First, we formulate LM-MCE as a Bayes risk minimization problem whose loss function not only includes empirical error rates but also a margin-bound risk. This new formulation allows us to extend the same technique to a wide variety of MCE based training. Second, we have successfully applied LM-MCE training approach to the Microsoft internal large vocabulary telephony speech recognition task (with 2000 hours of training data and 120K of vocabulary) and achieved significant recognition accuracy improvement across-the-board. To our best knowledge, this is the first time that the large-margin approach is demonstrated to be successful in large-scale speech recognition tasks