Multilingual Information Access: The MuST System and its Integration with Systran (Demo)
The goal of the MuST [1] project is to develop a prototype system to facilitate not only retrieving documents from multilingual collections, but also to summarize and translate the retrieved document into the user’s preferred language. We focus on the integration of state-of-the-art technologies, try to identify the critical path of enabling multilingual information access, and propose possible solutions. As far as possible, the system employs existing resources and products, such as the search technologies from America Online (AOL)/Personal Library System (PLS) [2], and online Internet search engines. It incorporates web spider technology enabling users to target their areas and languages of interest. It provides multilingual summarization technology developed at ISI [3] enabling users to quickly judge the relevance of the retrieved documents. It also integrates deep and shallow translation engines, both built at ISI [4] and commercially available, for online browsing of foreign language texts. We use the World Wide Web as our multilingual document sources and assume English is the source language. MuST can handle the languages English, Arabic, Japanese, Spanish, and Bahasa Indonesia. We plan to add more languages in the near future.