On Minimizing Training Corpus for Parser Acquisition

dc.contributor.authorHwa, Rebeccaen_US
dc.date.accessioned2004-05-31T23:11:12Z
dc.date.available2004-05-31T23:11:12Z
dc.date.created2001-05en_US
dc.date.issued2001-09-05en_US
dc.description.abstractMany corpus-based natural language processing systems rely on using large quantities of annotated text as their training examples. Building this kind of resource is an expensive and labor-intensive project. To minimize effort spent on annotating examples that are not helpful the training process, recent research efforts have begun to apply active learning techniques to selectively choose data to be annotated. In this work, we consider selecting training examples with the tree-entropy metric. Our goal is to assess how well this selection technique can be applied for training different types of parsers. We find that tree-entropy can significantly reduce the amount of training annotation for both a history-based parser and an EM-based parser. Moreover, the examples selected for the history-based parser are also good for training the EM-based parser, suggesting that the technique is parser independent. Cross-referenced as UMIACS-TR-2001-40 Cross-referenced as LAMP-TR-073en_US
dc.format.extent178493 bytes
dc.format.mimetypeapplication/postscript
dc.identifier.urihttp://hdl.handle.net/1903/1138
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-4258en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-2001-40en_US
dc.relation.ispartofseriesLAMP-TR-073en_US
dc.titleOn Minimizing Training Corpus for Parser Acquisitionen_US
dc.typeTechnical Reporten_US

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
CS-TR-4258.ps
Size:
174.31 KB
Format:
Postscript Files
Loading...
Thumbnail Image
Name:
CS-TR-4258.pdf
Size:
165.39 KB
Format:
Adobe Portable Document Format
Description:
Auto-generated copy of CS-TR-4258.ps