Toward Predicting the Outcome of an A/B Experiment for Search Relevance

  • Lihong Li ,
  • Imed Zitouni ,
  • Jin Young Kim

Proceedings of the 8th ACM International Conference on Web Search and Data Mining |

Published by ACM - Association for Computing Machinery

A standard approach to estimating online click-based metrics
of a ranking function is to run it in a controlled experiment
on live users. While reliable and popular in practice,
conguring and running an online experiment is cumbersome
and time-intensive. In this work, inspired by recent
successes of oine evaluation techniques for recommender
systems, we study an alternative that uses historical search
log to reliably predict online click-based metrics of a new
ranking function, without actually running it on live users.
To tackle novel challenges encountered in Web search,
variations of the basic techniques are proposed. The rst
is to take advantage of diversied behavior of a search engine
over a long period of time to simulate randomized data
collection, so that our approach can be used at very low cost.
The second is to replace exact matching (of recommended
items in previous work) by fuzzy matching (of search result
pages) to increase data eciency, via a better trade-o
of bias and variance. Extensive experimental results based
on large-scale real search data from a major commercial
search engine in the US market demonstrate our approach
is promising and has potential for wide use in Web search.