A Categorial Variation Database for English
A Categorial Variation Database for English
Loading...
Files
Publication or External Link
Date
2003-02-27
Authors
Habash, Nizar
Dorr, Bonnie
Advisor
Citation
DRUM DOI
Abstract
We describe our approach to the construction and evaluation of a
large-scale database called ``CatVar'' which contains categorial
variations of English lexemes. Due to the prevalence of cross-language
categorial variation in multilingual applications, our
categorial-variation resource may serve as an integral part of a diverse
range of natural language applications. Thus, the research reported
herein overlaps heavily with that of the machine-translation,
lexicon-construction, and information-retrieval communities.
We apply the information-retrieval metrics of precision and recall to
evaluate the accuracy and coverage of our database with respect to a
human-produced gold standard. This evaluation reveals that the categorial
database achieves a high degree of precision and recall. Additionally, we
demonstrate that the database improves on the linkability of Porter
Stemmer by over 30\%.
UMIACS-TR-2003-13
LAMP-TR-095