Center for Advanced Study of Language Research Works
Permanent URI for this collection
Browse
Browsing Center for Advanced Study of Language Research Works by Subject "asymmetric cost weights"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item An Approach to Reducing Annotation Costs for BioNLP(Association for Computational Linguistics, 2008-06) Bloodgood, Michael; Vijay-Shanker, KThere is a broad range of BioNLP tasks for which active learning (AL) can significantly reduce annotation costs and a specific AL algorithm we have developed is particularly effective in reducing annotation costs for these tasks. We have previously developed an AL algorithm called ClosestInitPA that works best with tasks that have the following characteristics: redundancy in training material, burdensome annotation costs, Support Vector Machines (SVMs) work well for the task, and imbalanced datasets (i.e. when set up as a binary classification problem, one class is substantially rarer than the other). Many BioNLP tasks have these characteristics and thus our AL algorithm is a natural approach to apply to BioNLP tasks.Item Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets(Association for Computational Linguistics, 2009-06) Bloodgood, Michael; Vijay-Shanker, KActively sampled data can have very different characteristics than passively sampled data. Therefore, it’s promising to investigate using different inference procedures during AL than are used during passive learning (PL). This general idea is explored in detail for the focused case of AL with cost-weighted SVMs for imbalanced data, a situation that arises for many HLT tasks. The key idea behind the proposed InitPA method for addressing imbalance is to base cost models during AL on an estimate of overall corpus imbalance computed via a small unbiased sample rather than the imbalance in the labeled training data, which is the leading method used during PL.