An Approach to Reducing Annotation Costs for BioNLP

Bloodgood, MichaelVijay-Shanker, KThere is a broad range of BioNLP tasks for which active learning (AL) can significantly reduce annotation costs and a specific AL algorithm we have developed is particularly effective in reducing annotation costs for these tasks. We have previously developed an AL algorithm called ClosestInitPA that works best with tasks that have the following characteristics: redundancy in training material, burdensome annotation costs, Support Vector Machines (SVMs) work well for the task, and imbalanced datasets (i.e. when set up as a binary classification problem, one class is substantially rarer than the other). Many BioNLP tasks have these characteristics and thus our AL algorithm is a natural approach to apply to BioNLP tasks.en-UScomputer sciencestatistical methodsartificial intelligencemachine learningcomputational linguisticsnatural language processinghuman language technologytext processingactive learningselective samplingquery learningannotation bottleneckannotation costssupport vector machinesSVMscost-weighted support vector machinescost-weighted SVMsimbalanced dataimbalanced datasetsasymmetric cost factorsasymmetric cost weightscost-sensitive learningcost-sensitive active learningimbalanced learningBioNLPbiomedical natural language processingbiomedical text processingprotein-protein interaction extractionMedline text classificationbiomedical named entity recognitionbiomedical NERbiomedical named entity classificationAn Approach to Reducing Annotation Costs for BioNLPArticle