Robust Speech Recognition by Normalization of the Acoustic Space

  • Alex Acero ,
  • Richard Stern

Proc. of the International Conference on Acoustics, Speech and Signal Processing |

Published by Institute of Electrical and Electronics Engineers, Inc.

In this paper we present several algorithms that increase the robustness of SPHINX, the CMU continuous-speech speaker-independent recognition system, by normalizing the acoustic space via minimization of the overall VQ distortion. We propose an affine transformation of the cepstrum in which a matrix multiplication performs frequency normalization and a vector addition attempts environment normalization. The algorithms for environment normalization are very efficient and they improve dramatically the recognition accuracy when the system is tested on a microphone other from the one on which it was trained. The frequency normalization algorithm applies a different warping of the frequency axis to different speakers and it achieves a 10% decrease in error rate.