cwl_eval: An evaluation tool for information retrieval

Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) |

We present a tool (“cwl_eval“) which unifies many metrics typically used to evaluate information retrieval systems using test collections.In the C/W/L framework metrics are specified via a single function which can be used to derive a number of related measurements: Expected Utility per item, Expected Total Utility, Expected Cost per item, Expected Total Cost, and Expected Depth. The C/W/L framework brings together several independent approaches for measuring the quality of a ranked list, and provides a coherent user model-based framework for developing measures based on utility (gain) and cost. Here we outline the C/W/L measurement framework; describe the cwl_eval architecture; and provide examples of how to use it. We provide implementations of a number of recent metrics,including Time Biased Gain, U-Measure, Bejewelled Measure, and the Information Foraging Based Measure, as well as previous metrics such as Precision, Average Precision, Discounted Cumulative Gain, Rank-Biased Precision, and INST. By providing state-of-the-art and traditional metrics within the same framework, we promote a standardised approach to evaluating search effectiveness.