A Categorial Variation Database for English
Files
Publication or External Link
External Link to Data Files
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
We describe our approach to the construction and evaluation of a
large-scale database called ``CatVar'' which contains categorial
variations of English lexemes. Due to the prevalence of cross-language
categorial variation in multilingual applications, our
categorial-variation resource may serve as an integral part of a diverse
range of natural language applications. Thus, the research reported
herein overlaps heavily with that of the machine-translation,
lexicon-construction, and information-retrieval communities.
We apply the information-retrieval metrics of precision and recall to evaluate the accuracy and coverage of our database with respect to a human-produced gold standard. This evaluation reveals that the categorial database achieves a high degree of precision and recall. Additionally, we demonstrate that the database improves on the linkability of Porter Stemmer by over 30%. UMIACS-TR-2003-13 LAMP-TR-095