Contextual and Dimensional Relevance Judgments for Reusable SERP-level Evaluation

The 23rd International World Wide Web Conference (WWW 2014) |

Published by ACM

Publication

Document-level relevance judgments are a major component in the calculation of effectiveness metrics. Collecting high-quality judgments is therefore a critical step in information retrieval evaluation. However, the nature of, and the assumptions underlying, relevance judgment collection have not received much attention. In particular, relevance judgments are typically collected for each document in isolation, although users read each document in the context of other documents. In this work, we aim to investigate the nature of relevance judgment collection. We collect relevance labels in both isolated and conditional settings, and ask for judgments in various dimensions of relevance, as well as overall relevance. Then we compare the relevance metrics based on various types of judgments with other metrics of quality such as User Preference. Our analyses illuminate how these settings for judgment collection affect the quality and the characteristics of the judgments. We also find that the metrics based on conditional judgments show higher correlation with user preference than isolated judgments.