Re-evaluating Machine Translation Results with Paraphrase Support
- Liang Zhou ,
- Chin-Yew Lin ,
- Eduard Hovy
In this paper, we present ParaEval, an automatic evaluation framework that uses paraphrases to improve the quality of machine translation evaluations. Previous work has focused on fixed n-gram evaluation metrics coupled with lexical identity matching. ParaEval addresses three important issues: support for paraphrase/synonym matching, recall measurement, and correlation with human judgments. We show that ParaEval correlates significantly better than BLEU with human assessment in measurements for both fluency and adequacy.