On Minimizing Training Corpus for Parser Acquisition
On Minimizing Training Corpus for Parser Acquisition
Files
Publication or External Link
Date
2001-09-05
Authors
Hwa, Rebecca
Advisor
Citation
DRUM DOI
Abstract
Many corpus-based natural language processing systems rely on using
large quantities of annotated text as their training examples.
Building this kind of resource is an expensive and labor-intensive
project. To minimize effort spent on annotating examples that are not
helpful the training process, recent research efforts have begun to
apply active learning techniques to selectively choose data to be
annotated. In this work, we consider selecting training examples with
the tree-entropy metric. Our goal is to assess how well this
selection technique can be applied for training different types of
parsers. We find that tree-entropy can significantly reduce the
amount of training annotation for both a history-based parser and an
EM-based parser. Moreover, the examples selected for the
history-based parser are also good for training the EM-based parser,
suggesting that the technique is parser independent.
Cross-referenced as UMIACS-TR-2001-40
Cross-referenced as LAMP-TR-073