Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics
- Chin-Yew Lin ,
- E.H. Hovy
Organized by HLT | NAACL
Following the recent adoption by the machine translation community of automatic evaluation using the BLEU/NIST scoring process, we conduct an in-depth study of a similar idea for evaluating summaries. The results show that automatic evaluation using unigram cooccurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results.