Improving Query Translation for CLIR Using Statistical Models
- Jianfeng Gao ,
- Jian-Yun Nie ,
- Endong Xun ,
- Jian Zhang ,
- Ming Zhou ,
- Changning Huang
SIGIR'01, New Orleans, Lousiana, USA |
Dictionaries have often been used for query translation in crosslanguage information retrieval (CLIR). However, we are faced with the problem of translation ambiguity, i.e. multiple translations are stored in a dictionary for a word. In addition, a word-by-word query translation is not precise enough. In this paper, we explore several methods to improve the previous dictionary-based query translation. First, as many as possible, noun phrases are recognized and translated as a whole by using statistical models and phrase translation patterns. Second, the best word translations are selected based on the cohesion of the translation words. Our experimental results on TREC English-Chinese CLIR collection show that these techniques result in significant improvements over the simple dictionary approaches, and achieve even better performance than a high-quality machine translation system.