Improving Query Translation for CLIR Using Statistical Models

Jianfeng Gao; Jian-Yun Nie; Endong Xun; Jian Zhang; Ming Zhou; Changning Huang

Improving Query Translation for CLIR Using Statistical Models

Jianfeng Gao ,
Jian-Yun Nie ,
Endong Xun ,
Jian Zhang ,
Ming Zhou ,
Changning Huang

SIGIR'01, New Orleans, Lousiana, USA | September 2001

Download BibTex

Dictionaries have often been used for query translation in crosslanguage information retrieval (CLIR). However, we are faced with the problem of translation ambiguity, i.e. multiple translations are stored in a dictionary for a word. In addition, a word-by-word query translation is not precise enough. In this paper, we explore several methods to improve the previous dictionary-based query translation. First, as many as possible, noun phrases are recognized and translated as a whole by using statistical models and phrase translation patterns. Second, the best word translations are selected based on the cohesion of the translation words. Our experimental results on TREC English-Chinese CLIR collection show that these techniques result in significant improvements over the simple dictionary approaches, and achieve even better performance than a high-quality machine translation system.