Automatic learning of speech recognition grammars from example sentences to ease the development of spoken language systems.
Researcher Ye-Yi Wang wants to have more time for vacation, so he is teaching his computer to do some work for him.
Wang has been working on Spoken Language Understanding for the MiPad project since he was hired to Microsoft Research. He has developed a robust parser and the understanding grammars for several projects. “Grammar development is painful and error-prone. It is time-consuming, tedious and it requires expertise in computational linguistics. Occupied with the work to speech-enable applications, I’ve never had enough time to use up my three-week vacation these years,” says Wang.
According to Wang, many state-of-the-art conversational systems use semantic-based robust understanding. In this approach, computers “understand” speech by normalizing the output from a speech recognizer into a canonical representation with a robust parser. The parser does this with a handcrafted semantic grammar. While the robust parser can be written once and used many times for different tasks, the difficulty is due to the requirement that a new semantic grammar be developed for every application domain. Because of this, speech-enabled applications are mostly developed in large human language technology labs as prototype research systems.
“Microsoft is a platform company. It is extremely important to provide developers with easy-to-use tools for our platforms, so that speech-enabled applications and web services can become mainstream,” says Alex Acero, Wang’s manager, who is also involved in the project.
They focus on developing technologies for smart tools that allow an average developer to speech-enable applications or web services. This differs from the work in automatic grammar inference, which tries to learn grammars automatically from a corpus of training sentences. Most research in grammar inference has focused on toy problems, and application of such approaches on grammar structure learning for natural language has not been satisfactory for natural language understanding applications. According to Wang, the limited success is due to the complexity of the problem and the typical sparseness of the training data relative to the complexity of the target grammar. There is not a good generalization mechanism to correctly cover a large variety of language constructions unseen in the training data. “Instead of ambiguous automatic grammar inference, we adopt a very practical approach by integrating multiple sources of easy-to-get information,” says Wang.
Several general technologies are currently pursued to take advantage of these information sources, including:
- Automatic generation of template grammar from semantic schema: The semantic schema defines the entity relations of a specific domain. It serves as the specification for a language-enabled application. Their technology can automatically generate a Context Free Semantic Grammar template that inherits the semantic information specified in a semantic schema.
- Learning from semantic annotation: With the involvement of grammar developers and the help from the robust parser, a small amount of training sentences can be easily annotated to their canonical representations. From the annotations, Fast Learner can learn the language expressions for the components in the automatically generated semantic grammar template.
- Syntactic Constraints: Domain specific language must comply with the syntactic constraints of a language. Some simple syntactic clues, for example, part-of-speech constraints, can be used to reduce the search space in grammar learning.
- Grammar Library: Some low level semantic entities, such as date, time, duration, postal address, currency, numbers, percentage, etc, are not domain-specific. They are universal building blocks that can be written once and then shared by many applications.
In their ASRU 2001 paper, “Grammar Learning for Spoken Language Understanding,” they reported some exciting results. On MiPad data, the grammar generated with their technologies already outperformed the manually developed grammar — the understanding error rates have been consistently reduced by 40% to 60%.
“This is very promising, given the fact that many more powerful technologies have not been applied yet.” says Wang. Acero agrees: “We believe that the learning of statistical grammar can further improve the performance, and there are still many things in our agenda to reduce interactions between the toolkit and grammar developers.”
Based on the technologies, they have created SGStudio (Semantic Grammar Studio) that enables non-speech experts to develop semantic grammars for speech recognition and understanding.