TREC-10 Web Track Experiments at MSRNCN

  • ,
  • Guihong Cao ,
  • Hongzhao He ,
  • Min Zhang ,
  • Jian-Yun Nie ,
  • Stephen Walker ,
  • Stephen Robertson

TREC-10 |

In TREC-10, Microsoft Research China (MSRCN) participated in the Web track (ad hoc retrieval task and homepage finding task). The latest version of the Okapi system (Windows 2000 version) was used. We focused on the developing of content-based retrieval and linkbased retrieval, and investigated the suitable combination of the two. For content-based retrieval, we examined the problems of weighting scheme, re-weighting and pseudo-relevance feedback (PRF). Then we developed a method called collection refinement (CE) for QE.

We investigated the use of two kinds of link information, link anchor and link structure. We used anchor descriptions instead of content text to build index. Furthermore, different search strategies, such as spreading activation and PageRank, have been tested.

Experimental results show: (1) Okapi system is robust and effective for web retrieval. (2) In ad hoc task, content-based retrieval achieved much better performance, and the impact of anchor text can be neglected; while for homepage finding task, both anchor text and content text provide useful information contributing more on precision and recall respectively. (3) Although query expansion does not show any improvement in our web retrieval experiments, we believe that there are still potential for CE.