Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets
dc.contributor.author | Bloodgood, Michael | |
dc.contributor.author | Vijay-Shanker, K | |
dc.date.accessioned | 2014-08-28T17:33:41Z | |
dc.date.available | 2014-08-28T17:33:41Z | |
dc.date.issued | 2009-06 | |
dc.description.abstract | Actively sampled data can have very different characteristics than passively sampled data. Therefore, it’s promising to investigate using different inference procedures during AL than are used during passive learning (PL). This general idea is explored in detail for the focused case of AL with cost-weighted SVMs for imbalanced data, a situation that arises for many HLT tasks. The key idea behind the proposed InitPA method for addressing imbalance is to base cost models during AL on an estimate of overall corpus imbalance computed via a small unbiased sample rather than the imbalance in the labeled training data, which is the leading method used during PL. | en_US |
dc.identifier | https://doi.org/10.13016/M2QP4K | |
dc.identifier.citation | Michael Bloodgood and K. Vijay-Shanker. 2009. Taking into account the differences between actively and passively acquired data: The case of active learning with support vector machines for imbalanced datasets. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 137-140, Boulder, Colorado, June. Association for Computational Linguistics. | en_US |
dc.identifier.uri | http://hdl.handle.net/1903/15585 | |
dc.language.iso | en_US | en_US |
dc.publisher | Association for Computational Linguistics | en_US |
dc.relation.isAvailableAt | Center for Advanced Study of Language | |
dc.relation.isAvailableAt | Digitial Repository at the University of Maryland | |
dc.relation.isAvailableAt | University of Maryland (College Park, Md) | |
dc.subject | computer science | en_US |
dc.subject | statistical methods | en_US |
dc.subject | artificial intelligence | en_US |
dc.subject | machine learning | en_US |
dc.subject | computational linguistics | en_US |
dc.subject | natural language processing | en_US |
dc.subject | human language technology | en_US |
dc.subject | text processing | en_US |
dc.subject | active learning | en_US |
dc.subject | selective sampling | en_US |
dc.subject | query learning | en_US |
dc.subject | binary classification | en_US |
dc.subject | annotation bottleneck | en_US |
dc.subject | annotation costs | en_US |
dc.subject | support vector machines | en_US |
dc.subject | SVMs | en_US |
dc.subject | cost-weighted support vector machines | en_US |
dc.subject | cost-weighted SVMs | en_US |
dc.subject | imbalanced data | en_US |
dc.subject | imbalanced datasets | en_US |
dc.subject | corpus imbalance | en_US |
dc.subject | imbalanced learning | en_US |
dc.subject | asymmetric cost factors | en_US |
dc.subject | asymmetric cost weights | en_US |
dc.subject | positive amplification | en_US |
dc.subject | cost models | en_US |
dc.subject | cost modeling | en_US |
dc.subject | cost-sensitive learning | en_US |
dc.subject | cost-sensitive active learning | en_US |
dc.subject | relation extraction | en_US |
dc.subject | BioNLP | en_US |
dc.subject | biomedical natural language processing | en_US |
dc.subject | biomedical text processing | en_US |
dc.subject | protein-protein interaction extraction | en_US |
dc.subject | text classification | en_US |
dc.subject | newswire text classification | en_US |
dc.title | Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets | en_US |
dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- activeLearningSupportVectorMachinesImbalancedDatasetsNAACL2009.pdf
- Size:
- 253.62 KB
- Format:
- Adobe Portable Document Format