Language Modeling for Speech Recognition

Established: January 29, 2004

Did I just say “It’s fun to recognize speech?” or “It’s fun to wreck a nice beach?” It’s hard to tell because they sound about the same. Of course, it’s a lot more likely that I would say “recognize speech” than “wreck a nice beach.” Language models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. This lets the recognizer make the right guess when two different sentences sound the same.

Our language modeling research falls into several categories:

Language Model Adaptation. Natural language technology in general and language models in particular are very brittle when moving from one domain to another. Current statistical language models are built from text specific to newspapers and TV/radio broadcasts which has little to do with the everyday use of language by a particular individual. We are investigating means of adapting a general-domain statistical language model to a new domain/user when we have access to limited amounts of sample data from the new domain/user.
Can Syntactic Structure Help? Current language models make no use of the syntactic properties of natural language but rather use very simple statistics such as word co-occurences. Recent results show that incorporating syntactic constraints in a statistical language model reduces the word erroror rate on a conventional dictation task by 10% . We are working on finding the best way of “putting language into language models” as well as exploring the new possibilities opened by such structured language models for other tasks such as speech and language understanding.
Speech Utterance Classification A simple first step to more natural user interfaces in interactive voice response systems is automated call routing. Instead of listening to prompts like “If you are trying to reach department X say Yes, otherwise say No” or punching keys on your telephone keypad, one could simply state in a sentence what the problem is, for example “There is a fraudulous transaction on my last statement” and get connected to the right customer service representative. We are developing technology that aims at classifying speech utterances in a limited set of classes, enhancing the role of the traditional language model such that it also assigns a category to a given utterance
Building the best language models we can. In general, the better the language model, the lower the error rate of the speech recognizer. By putting together the best results available on language modeling, we have created a language model that outperforms a standard baseline by 45%, leading to a 10% reduction in error rate for our speech recognizer. The system has the best reported results of any language model.
Language modeling for other applications. Speech recognition is not the only use for language models. They are also useful in fields like handwriting recognition, spelling correction, even typing Chinese! Like speech recognition, all of these are areas where the input is ambiguous in some way, and a language model can help us guess the most likely input. We’re also working on finding new uses for language models, in other areas.

Speech Technology Home