A Comparative Study of Knowledge-Based Approaches for Cross-Language Information Retrieval

Thumbnail Image
CS-TR-3897.ps(380.06 KB)
No. of downloads: 128
CS-TR-3897.pdf(195.42 KB)
No. of downloads: 569
Publication or External Link
Oard, Douglas W.
Dorr, Bonnie J.
Hackett, Paul G.
Katsova, Maria
Cross-language retrieval systems seek to use queries in one natural language to guide the retrieval of documents that might be written in another. Acquisition and representation of translation knowledge plays a central role in this process. This paper explores the utility of two sources of manually encoded translation knowledge, bilingual dictionaries and translation lexicons, for cross-language retrieval. We have implemented six query translation techniques that use bilingual dictionaries, one based on lexical-semantic analysis, and one based on direct use of the translation output from an existing machine translation system; these are compared with a document translation technique that uses output from the same existing translation system. Average precision measures on portions of the TREC collection suggest that arbitrarily selecting a single translation from a bilingual dictionary is typically no less effective than using every translation in the dictionary, that query translation using an existing machine translation system can achieve somewhat better effectiveness than simple dictionary-based techniques, and that performing document translation rather than query translation may result in further improvements in retrieval effectiveness under some conditions. (Also cross-referenced as UMIACS-TR-98-27)