Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets

Bloodgood, Michael; Vijay-Shanker, K

Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets

dc.contributor.author	Bloodgood, Michael
dc.contributor.author	Vijay-Shanker, K
dc.date.accessioned	2014-08-28T17:33:41Z
dc.date.available	2014-08-28T17:33:41Z
dc.date.issued	2009-06
dc.description.abstract	Actively sampled data can have very different characteristics than passively sampled data. Therefore, it’s promising to investigate using different inference procedures during AL than are used during passive learning (PL). This general idea is explored in detail for the focused case of AL with cost-weighted SVMs for imbalanced data, a situation that arises for many HLT tasks. The key idea behind the proposed InitPA method for addressing imbalance is to base cost models during AL on an estimate of overall corpus imbalance computed via a small unbiased sample rather than the imbalance in the labeled training data, which is the leading method used during PL.	en_US
dc.identifier	https://doi.org/10.13016/M2QP4K
dc.identifier.citation	Michael Bloodgood and K. Vijay-Shanker. 2009. Taking into account the differences between actively and passively acquired data: The case of active learning with support vector machines for imbalanced datasets. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 137-140, Boulder, Colorado, June. Association for Computational Linguistics.	en_US
dc.identifier.uri	http://hdl.handle.net/1903/15585
dc.language.iso	en_US	en_US
dc.publisher	Association for Computational Linguistics	en_US
dc.relation.isAvailableAt	Center for Advanced Study of Language
dc.relation.isAvailableAt	Digitial Repository at the University of Maryland
dc.relation.isAvailableAt	University of Maryland (College Park, Md)
dc.subject	computer science	en_US
dc.subject	statistical methods	en_US
dc.subject	artificial intelligence	en_US
dc.subject	machine learning	en_US
dc.subject	computational linguistics	en_US
dc.subject	natural language processing	en_US
dc.subject	human language technology	en_US
dc.subject	text processing	en_US
dc.subject	active learning	en_US
dc.subject	selective sampling	en_US
dc.subject	query learning	en_US
dc.subject	binary classification	en_US
dc.subject	annotation bottleneck	en_US
dc.subject	annotation costs	en_US
dc.subject	support vector machines	en_US
dc.subject	SVMs	en_US
dc.subject	cost-weighted support vector machines	en_US
dc.subject	cost-weighted SVMs	en_US
dc.subject	imbalanced data	en_US
dc.subject	imbalanced datasets	en_US
dc.subject	corpus imbalance	en_US
dc.subject	imbalanced learning	en_US
dc.subject	asymmetric cost factors	en_US
dc.subject	asymmetric cost weights	en_US
dc.subject	positive amplification	en_US
dc.subject	cost models	en_US
dc.subject	cost modeling	en_US
dc.subject	cost-sensitive learning	en_US
dc.subject	cost-sensitive active learning	en_US
dc.subject	relation extraction	en_US
dc.subject	BioNLP	en_US
dc.subject	biomedical natural language processing	en_US
dc.subject	biomedical text processing	en_US
dc.subject	protein-protein interaction extraction	en_US
dc.subject	text classification	en_US
dc.subject	newswire text classification	en_US
dc.title	Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: activeLearningSupportVectorMachinesImbalancedDatasetsNAACL2009.pdf
Size:: 253.62 KB
Format:: Adobe Portable Document Format

Download

Collections

Center for Advanced Study of Language Research Works