Analysis of Stopping Active Learning based on Stabilizing Predictions

dc.contributor.authorBloodgood, Michael
dc.contributor.authorGrothendieck, John
dc.date.accessioned2014-07-14T21:10:24Z
dc.date.available2014-07-14T21:10:24Z
dc.date.issued2013-08
dc.description.abstractWithin the natural language processing (NLP) community, active learning has been widely investigated and applied in order to alleviate the annotation bottleneck faced by developers of new NLP systems and technologies. This paper presents the first theoretical analysis of stopping active learning based on stabilizing predictions (SP). The analysis has revealed three elements that are central to the success of the SP method: (1) bounds on Cohen’s Kappa agreement between successively trained models impose bounds on differences in F-measure performance of the models; (2) since the stop set does not have to be labeled, it can be made large in practice, helping to guarantee that the results transfer to previously unseen streams of examples at test/application time; and (3) good (low variance) sample estimates of Kappa between successive models can be obtained. Proofs of relationships between the level of Kappa agreement and the difference in performance between consecutive models are presented. Specifically, if the Kappa agreement between two models exceeds a threshold T (where T > 0), then the difference in F-measure performance between those models is bounded above by 4(1−T)/T in all cases. If precision of the positive conjunction of the models is assumed to be p, then the bound can be tightened to 4(1−T)/((p+1)T).en_US
dc.identifier.citationMichael Bloodgood and John Grothendieck. 2013. Analysis of stopping active learning based on stabilizing predictions. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 10-19, Sofia, Bulgaria, August. Association for Computational Linguistics.en_US
dc.identifier.urihttp://hdl.handle.net/1903/15526
dc.language.isoen_USen_US
dc.publisherAssociation for Computational Linguisticsen_US
dc.relation.isAvailableAtCenter for Advanced Study of Language
dc.relation.isAvailableAtDigitial Repository at the University of Maryland
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md)
dc.subjectcomputer scienceen_US
dc.subjectartificial intelligenceen_US
dc.subjectmachine learningen_US
dc.subjectactive learningen_US
dc.subjectselective samplingen_US
dc.subjectquery learningen_US
dc.subjectstopping criteriaen_US
dc.subjectstopping methodsen_US
dc.subjectstabilizing predictionsen_US
dc.subjectstatistical analysisen_US
dc.subjectperformance boundsen_US
dc.subjectagreement statisticsen_US
dc.subjectagreement metricsen_US
dc.subjectannotation bottlenecken_US
dc.subjectKappa statisticen_US
dc.subjectCohen's Kappaen_US
dc.subjectF-measureen_US
dc.subjectF-scoreen_US
dc.subjectrelationship between Kappa and F-measureen_US
dc.subjectcontingency table analysisen_US
dc.subjectnatural language processingen_US
dc.subjectcomputational linguisticsen_US
dc.titleAnalysis of Stopping Active Learning based on Stabilizing Predictionsen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
analysisOfStoppingCoNLL2013.pdf
Size:
249.19 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.57 KB
Format:
Item-specific license agreed upon to submission
Description: