MSR KMG at TREC 2014 KBA Track Vital Filtering Task
In this paper, we present our strategy for TREC 2014 KBA track Vital Filtering task. This task is also known as “Cumulative Citation Recommendation” or “CCR” in 2012 and 2013. Vital Filtering task is to identify “vital” documents containing timely and new information that should be used to update the profile of a given entity (also called a topic). Our strategy for vital filtering is to first retrieve as many relevant documents as possible and then apply classification and ranking methods to differentiate vital documents from non-vital documents. We first index the corpus and retrieve candidate documents by combining entity names and their redirect names as phrase queries. We then learn to rank documents by leveraging four types of feature: 1) time range: the earlier documents get a higher score than the later documents, 2) temporal feature: burst of entity mentions, 3) title/profession feature: the title and profession information around an entity mention, and 4) action pattern: the entity name and its associated verb in the sentence mentioning the entity. A simple global adjustment is applied at the end to further improve system performance. Our experiment results confirm that these features are very effective, especially for action pattern and time range. The system incorporating all the proposed features significantly outperforms the phrase query baseline.