Bloodgood, MichaelCallison-Burch, ChrisWe explore how to improve machine translation systems by adding more translation data in situations where we already have substantial resources. The main challenge is how to buck the trend of diminishing returns that is commonly encountered. We present an active learning-style data solicitation algorithm to meet this challenge. We test it, gathering annotations via Amazon Mechanical Turk, and find that we get an order of magnitude increase in performance rates of improvement.en-UScomputer sciencestatistical methodsartificial intelligencemachine learningcomputational linguisticsnatural language processinghuman language technologytranslation technologymachine translationstatistical machine translationactive learningselective samplingquery learningstopping criteriastopping methodscrowdsourcingcost-focused active learningcost-efficient annotationannotation costsannotation bottleneckannotation cost metricsUrdu-English translationuncertainty-based active learninguncertainty-based samplingAmazon Mechanical TurkHighlighted N-Gram MethodBucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine TranslationArticle