On Minimizing Training Corpus for Parser Acquisition

Hwa, Rebecca

On Minimizing Training Corpus for Parser Acquisition

dc.contributor.author	Hwa, Rebecca	en_US
dc.date.accessioned	2004-05-31T23:11:12Z
dc.date.available	2004-05-31T23:11:12Z
dc.date.created	2001-05	en_US
dc.date.issued	2001-09-05	en_US
dc.description.abstract	Many corpus-based natural language processing systems rely on using large quantities of annotated text as their training examples. Building this kind of resource is an expensive and labor-intensive project. To minimize effort spent on annotating examples that are not helpful the training process, recent research efforts have begun to apply active learning techniques to selectively choose data to be annotated. In this work, we consider selecting training examples with the tree-entropy metric. Our goal is to assess how well this selection technique can be applied for training different types of parsers. We find that tree-entropy can significantly reduce the amount of training annotation for both a history-based parser and an EM-based parser. Moreover, the examples selected for the history-based parser are also good for training the EM-based parser, suggesting that the technique is parser independent. Cross-referenced as UMIACS-TR-2001-40 Cross-referenced as LAMP-TR-073	en_US
dc.format.extent	178493 bytes
dc.format.mimetype	application/postscript
dc.identifier.uri	http://hdl.handle.net/1903/1138
dc.language.iso	en_US
dc.relation.isAvailableAt	Digital Repository at the University of Maryland	en_US
dc.relation.isAvailableAt	University of Maryland (College Park, Md.)	en_US
dc.relation.isAvailableAt	Tech Reports in Computer Science and Engineering	en_US
dc.relation.isAvailableAt	UMIACS Technical Reports	en_US
dc.relation.ispartofseries	UM Computer Science Department; CS-TR-4258	en_US
dc.relation.ispartofseries	UMIACS; UMIACS-TR-2001-40	en_US
dc.relation.ispartofseries	LAMP-TR-073	en_US
dc.title	On Minimizing Training Corpus for Parser Acquisition	en_US
dc.type	Technical Report	en_US

Files

Original bundle

Now showing 1 - 2 of 2

Name:: CS-TR-4258.ps
Size:: 174.31 KB
Format:: Postscript Files

Download

Name:: CS-TR-4258.pdf
Size:: 165.39 KB
Format:: Adobe Portable Document Format
Description:: Auto-generated copy of CS-TR-4258.ps

Download

Collections

Technical Reports from UMIACS
Technical Reports of the Computer Science Department