State-of-the-art statistical spoken language processing typically requires significant manual effort to construct domain-specific schemas (ontologies) as well as manual effort to annotate training data against these schemas. At the same time, a recent surge of activity and progress on semantic web-related concepts from the large search-engine companies represents a potential alternative to the manually intensive design of spoken language processing systems. Standards such as schema.org have been established for schemas (ontologies) that webmasters can use to semantically and uniformly markup their web pages. Search engines like Bing, Google, and Yandex have adopted these standards and are leveraging them to create semantic search engines at the scale of the web. As a result, the open linked data resources and semantic graphs covering various domains (such as Freebase [3]) have grown massively every year and contains far more information than any single resource anywhere on the Web. Furthermore, these resources contain links to text data (such as Wikipedia pages) related to the knowledge in the graph.
Recently, several studies on speech language processing started exploiting these massive linked data resources for language modeling and spoken language understanding. This tutorial will include a brief introduction to the semantic web and the linked data structure, available resources, and querying languages. An overview of related work on information extraction and language processing will be presented, where the main focus will be on methods for learning spoken language understanding models from these resources.