The TREC terabyte retrieval track

SIGIR Forum |

\urlhttp://research.microsoft.com/users/nickcr/pubs/clarke_sigirforum05.pdf

The Terabyte Retrieval Track of the Text REtrieval Conference (TREC) provides an opportunity
to test retrieval techniques and evaluation methodologies in the context of a terabyte-scale corpus.
Given the size of the corpus, the track also provides a vehicle for participants to investigate query
and indexing speeds. This brief summary outlines track activities to date and previews our plans for
TREC 2005. For complete information, the reader should consult the full version of this report [1].
A proposal for the Terabyte Track was developed during a SIGIR 2003 workshop and was
accepted by the TREC program committee for inclusion in TREC 2004. For the initial year of
operation, we decided to base the track on a crawl of the gov” domain, since we believed that this
would generate roughly a terabyte of data and would provide us with a realistic setting, in which
both links structure and anchor text could be productively exploited.