Retrieval consistency in the presence of query variations

Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) |

Published by ACM

Publication

A search engine that can return the ideal results for a person’s information need, independent of the specific query that is used to express that need, would be preferable to one that is overly swayed by the individual terms used; search engines should be consistent in the presence of syntactic query variations responding to the same information need. In this paper we examine the retrieval consistency of a set of five systems responding to syntactic query variations over one hundred topics, working with the UQV100 test collection, and using Rank-Biased Overlap (RBO) relative to a centroid ranking over the query variations per topic as a measure of consistency. We also introduce a new data fusion algorithm, Rank-Biased Centroid (RBC), for constructing a centroid ranking over a set of rankings from query variations for a topic. RBC is compared with alternative data fusion algorithms.

Our results indicate that consistency is positively correlated to a moderate degree with “deep” relevance measures. However, it is only weakly correlated with “shallow” relevance measures, as well as measures of topic complexity and variety in query expression. These findings support the notion that consistency is an independent property of a search engine’s retrieval effectiveness.