A Categorial Variation Database for English

Loading...
Thumbnail Image
Files
CS-TR-4443.ps(5.2 MB)
No. of downloads: 236
CS-TR-4443.pdf(104.36 KB)
No. of downloads: 1137
Publication or External Link
Date
2003-02-27
Authors
Habash, Nizar
Dorr, Bonnie
Advisor
Citation
DRUM DOI
Abstract
We describe our approach to the construction and evaluation of a large-scale database called ``CatVar'' which contains categorial variations of English lexemes. Due to the prevalence of cross-language categorial variation in multilingual applications, our categorial-variation resource may serve as an integral part of a diverse range of natural language applications. Thus, the research reported herein overlaps heavily with that of the machine-translation, lexicon-construction, and information-retrieval communities. We apply the information-retrieval metrics of precision and recall to evaluate the accuracy and coverage of our database with respect to a human-produced gold standard. This evaluation reveals that the categorial database achieves a high degree of precision and recall. Additionally, we demonstrate that the database improves on the linkability of Porter Stemmer by over 30\%. UMIACS-TR-2003-13 LAMP-TR-095
Notes
Rights