A Categorial Variation Database for English
dc.contributor.author | Habash, Nizar | en_US |
dc.contributor.author | Dorr, Bonnie | en_US |
dc.date.accessioned | 2004-05-31T23:25:28Z | |
dc.date.available | 2004-05-31T23:25:28Z | |
dc.date.created | 2003-02 | en_US |
dc.date.issued | 2003-02-27 | en_US |
dc.description.abstract | We describe our approach to the construction and evaluation of a large-scale database called ``CatVar'' which contains categorial variations of English lexemes. Due to the prevalence of cross-language categorial variation in multilingual applications, our categorial-variation resource may serve as an integral part of a diverse range of natural language applications. Thus, the research reported herein overlaps heavily with that of the machine-translation, lexicon-construction, and information-retrieval communities. We apply the information-retrieval metrics of precision and recall to evaluate the accuracy and coverage of our database with respect to a human-produced gold standard. This evaluation reveals that the categorial database achieves a high degree of precision and recall. Additionally, we demonstrate that the database improves on the linkability of Porter Stemmer by over 30\%. UMIACS-TR-2003-13 LAMP-TR-095 | en_US |
dc.format.extent | 5452136 bytes | |
dc.format.mimetype | application/postscript | |
dc.identifier.uri | http://hdl.handle.net/1903/1258 | |
dc.language.iso | en_US | |
dc.relation.isAvailableAt | Digital Repository at the University of Maryland | en_US |
dc.relation.isAvailableAt | University of Maryland (College Park, Md.) | en_US |
dc.relation.isAvailableAt | Tech Reports in Computer Science and Engineering | en_US |
dc.relation.isAvailableAt | UMIACS Technical Reports | en_US |
dc.relation.ispartofseries | UM Computer Science Department; CS-TR-4443 | en_US |
dc.relation.ispartofseries | UMIACS; UMIACS-TR-2003-13 | en_US |
dc.relation.ispartofseries | LAMP-TR-095 | en_US |
dc.title | A Categorial Variation Database for English | en_US |
dc.type | Technical Report | en_US |