Improving Query Translation for CLIR Using Statistical Models

  • ,
  • Jian-Yun Nie ,
  • Endong Xun ,
  • Jian Zhang ,
  • Ming Zhou ,
  • Changning Huang

SIGIR'01, New Orleans, Lousiana, USA |

Dictionaries have often been used for query translation in crosslanguage information retrieval (CLIR). However, we are faced with the problem of translation ambiguity, i.e. multiple translations are stored in a dictionary for a word. In addition, a word-by-word query translation is not precise enough. In this paper, we explore several methods to improve the previous dictionary-based query translation. First, as many as possible, noun phrases are recognized and translated as a whole by using statistical models and phrase translation patterns. Second, the best word translations are selected based on the cohesion of the translation words. Our experimental results on TREC English-Chinese CLIR collection show that these techniques result in significant improvements over the simple dictionary approaches, and achieve even better performance than a high-quality machine translation system.