Automated Text Summarization (SIGIR’99 Tutorial)
- Eduard Hovy ,
- Chin-Yew Lin ,
- Daniel Marcu
Tutorial
SIGIR'99 Tutorial site: https://groups.ischool.berkeley.edu/archive/sigir99/tutorials.html
In this tutorial, we review the state of the art in automatic text summarization, and discuss and critically evaluate current approaches to the problem. The tutorial is structured as follows:
- The need for text summarization.
- What is a summary, exactly? What types are there? We outline a typology of summaries, including the following distinctions: indicative vs. informative; abstract vs. extract; generic vs. query-oriented; background vs. just-the-news; single-document vs. multi-document; and so on.
- An overview of the principal paradigms and approaches. We describe the typical decomposition of summarization into three stages, and explain in detail the major approaches to each stage. We contrast the strengths and weaknesses of the statistical/IR-based and the AI/NLP-based paradigms.
- Topic Identification. For this stage, we outline techniques based on stereotypical text structure, cue words, high-frequency indicator phrases, intratext connectivity, and discourse structure centrality. We provide detailed examples together with measures of effectiveness.
- Topic fusion. For this stage, we outline some ideas that have been proposed, including concept generalization and semantic association, and describe the inherent problems of large-scale world knowledge.
- Summary generation. For this stage, we outline the problems of sentence planning to achieve information compaction and to ensure coherence of the resulting summary.
- Evaluation: how good is a summary? Evaluation is a difficult issue. We describe various suggested measures and discuss the adequacy of current evaluation methods. We illustrate the measurement of individual features and show how some features are surprisingly bad and others surprisingly good predictors of importance.
- The future. Finally, we present a set of open problems that we perceive as being crucial for immediate progress in automatic summarization. Throughout, we highlight the strengths and weaknesses of statistical and symbolic/linguistic techniques in implementing efficient summarization systems. We discuss ways in which summarization systems can interact with and/or complement information extraction and information retrieval systems.