NTCIR-3 CLIR Experiments at MSRA

This paper describes three statistical models for the purpose of resolving query translation ambiguity for cross-language information retrieval (CLIR). First, a decaying co-occurrence model is present. It is an extension of traditional co-occurrence models in that it contains a decaying factor which decreases the mutual information when the distance between the terms increases. Second, a phrase translation model is described aiming to detect and translate noun phrases that are not stored in the dictionary. Finally, a triple translation model is proposed which provides a way of exploiting linguistic dependency information. We show experimentally improvements of using these models on TREC and NTCIR corpus.