Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation

View/ Open
Date
2010-07Author
Bloodgood, Michael
Callison-Burch, Chris
Citation
Michael Bloodgood and Chris Callison-Burch. 2010. Bucking the trend: cost-focused active learning for statistical machine translation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 854-864, Uppsala, Sweden, July. Association for Computational Linguistics.
Metadata
Show full item recordAbstract
We explore how to improve machine translation systems by adding more translation data in situations where we already have substantial resources. The main challenge is how to buck the trend of diminishing returns that is commonly encountered. We present an active learning-style data solicitation algorithm to meet this challenge. We test it, gathering annotations via Amazon Mechanical Turk, and find that we get an order of magnitude increase in performance rates of improvement.
Related items
Showing items related by title, author, creator and subject.
-
Improved Online Learning and Modeling for Feature-Rich Discriminative Machine Translation
Eidelman, Vladimir (2013)Most modern statistical machine translation (SMT) systems learn how to translate by constructing a discriminative model based on statistics from the data. A growing number of methods for discriminative training have been ... -
Searching to Translate and Translating to Search: When Information Retrieval Meets Machine Translation
Ture, Ferhan (2013)With the adoption of web services in daily life, people have access to tremendous amounts of information, beyond any human's reading and comprehension capabilities. As a result, search technologies have become a fundamental ... -
Handling Translation Divergences in Generation-Heavy Hybrid Machine Translation
Habash, Nizar; Dorr, Bonnie (2002-04-04)This paper describes a novel approach for handling translation divergences in a Generation-Heavy Hybrid Machine Translation (GHMT) system. The approach depends on the existence of rich target language resources such as ...