Mind the Gap: Learning to Choose Gaps for Question Generation

Proceedings of NAACL-HLT 2012 |

Published by Association for Computational Linguistics

Not all learning takes place in an educational setting: more and more self-motivated learners are turning to on-line text to learn about new topics. Our goal is to provide such learners with the well-known benefits of testing by automatically generating quiz questions for online text. Prior work on question generation has focused on the grammaticality of generated questions and generating effective multiple-choice distractors for individual question targets, both key parts of this problem. Our work focuses on the complementary aspect of determining what part of a sentence we should be asking about in the first place; we call this “gap selection.” We address this problem by asking human judges about the quality of questions generated from a Wikipedia-based corpus, and then training a model to effectively replicate these judgments. Our data shows that good gaps are of variable length and span all semantic roles, i.e., nouns as well as verbs, and that a majority of good questions do not focus on named entities. Our resulting system can generate fill-in-the-blank (cloze) questions from generic source materials.

Publication Downloads

Question-Generation Corpus

May 22, 2012

This corpus contains 2,250 candidate fill-in-the-blank questions and answers generated from sentences taken from 105 articles on Wikipedia's listing of vital articles and popular pages, along with ratings of the question quality from multiple judges, as well as unique judge IDs. The paper for which this corpus was originally developed is Becker, L., Basu, S., and Vanderwende, L. “Mind the Gap: Learning to Choose Gaps for Question Generation.” In Proceedings of NAACL-HLT 2012.