Cross-lingual C*ST*RD: English Access to Hindi Information
- Anton Leuski ,
- Chin-Yew Lin ,
- Liang Zhou ,
- Ulrich Germann ,
- Franz Josef Och ,
- Eduard Hovy
ACM Transactions on Asian Language Information Processing | , Vol 2(3): pp. 245-269
We present C*ST*RD, a cross-language information delivery system that supports cross-language information retrieval, information space visualization and navigation, machine translation, and text summarization of single documents and clusters of documents. C*ST*RD was assembled and trained within 1 month, in the context of DARPA’s Surprise Language Exercise, that selected as source a heretofore unstudied language, Hindi. Given the brief time, we could not create deep Hindi capabilities for all the modules, but instead experimented with combining shallow Hindi capabilities, or even English-only modules, into one integrated system. Various possible configurations, with different tradeoffs in processing speed and ease of use, enable the rapid deployment of C*ST*RD to new languages under various conditions.