Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets

dc.contributor.authorBloodgood, Michael
dc.contributor.authorVijay-Shanker, K
dc.date.accessioned2014-08-28T17:33:41Z
dc.date.available2014-08-28T17:33:41Z
dc.date.issued2009-06
dc.description.abstractActively sampled data can have very different characteristics than passively sampled data. Therefore, it’s promising to investigate using different inference procedures during AL than are used during passive learning (PL). This general idea is explored in detail for the focused case of AL with cost-weighted SVMs for imbalanced data, a situation that arises for many HLT tasks. The key idea behind the proposed InitPA method for addressing imbalance is to base cost models during AL on an estimate of overall corpus imbalance computed via a small unbiased sample rather than the imbalance in the labeled training data, which is the leading method used during PL.en_US
dc.identifierhttps://doi.org/10.13016/M2QP4K
dc.identifier.citationMichael Bloodgood and K. Vijay-Shanker. 2009. Taking into account the differences between actively and passively acquired data: The case of active learning with support vector machines for imbalanced datasets. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 137-140, Boulder, Colorado, June. Association for Computational Linguistics.en_US
dc.identifier.urihttp://hdl.handle.net/1903/15585
dc.language.isoen_USen_US
dc.publisherAssociation for Computational Linguisticsen_US
dc.relation.isAvailableAtCenter for Advanced Study of Language
dc.relation.isAvailableAtDigitial Repository at the University of Maryland
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md)
dc.subjectcomputer scienceen_US
dc.subjectstatistical methodsen_US
dc.subjectartificial intelligenceen_US
dc.subjectmachine learningen_US
dc.subjectcomputational linguisticsen_US
dc.subjectnatural language processingen_US
dc.subjecthuman language technologyen_US
dc.subjecttext processingen_US
dc.subjectactive learningen_US
dc.subjectselective samplingen_US
dc.subjectquery learningen_US
dc.subjectbinary classificationen_US
dc.subjectannotation bottlenecken_US
dc.subjectannotation costsen_US
dc.subjectsupport vector machinesen_US
dc.subjectSVMsen_US
dc.subjectcost-weighted support vector machinesen_US
dc.subjectcost-weighted SVMsen_US
dc.subjectimbalanced dataen_US
dc.subjectimbalanced datasetsen_US
dc.subjectcorpus imbalanceen_US
dc.subjectimbalanced learningen_US
dc.subjectasymmetric cost factorsen_US
dc.subjectasymmetric cost weightsen_US
dc.subjectpositive amplificationen_US
dc.subjectcost modelsen_US
dc.subjectcost modelingen_US
dc.subjectcost-sensitive learningen_US
dc.subjectcost-sensitive active learningen_US
dc.subjectrelation extractionen_US
dc.subjectBioNLPen_US
dc.subjectbiomedical natural language processingen_US
dc.subjectbiomedical text processingen_US
dc.subjectprotein-protein interaction extractionen_US
dc.subjecttext classificationen_US
dc.subjectnewswire text classificationen_US
dc.titleTaking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasetsen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
activeLearningSupportVectorMachinesImbalancedDatasetsNAACL2009.pdf
Size:
253.62 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.57 KB
Format:
Item-specific license agreed upon to submission
Description: