The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Bilingual term lists are extensively used as a resource for
dictionary-based Cross-Language Information Retrieval (CLIR),
in which the goal is to find documents written in one natural language
based on queries that are expressed in another. This paper identifies
eight types of terms that affect retrieval effectiveness in CLIR
applications through their coverage by general-purpose bilingual term
lists, and reports results from an experimental evaluation of the coverage
of 35 bilingual term lists in news retrieval application. Retrieval
effectiveness was found to be strongly influenced by term list size for
lists that contain between 3,000 and 30,000 unique terms per language.
Supplemental techniques for named entity translation were found to be
useful with even the largest lexicons. The contribution of named entity
translation was evaluated in a cross-language experiment involving English
and Chinese. Smaller effects were observed from deficiencies in the
coverage of domainspecific terminology when searching news stories.
UMIACS-TR-2003-22
LAMP-TR-097