Translation memory retrieval methods

dc.contributor.authorBloodgood, Michael
dc.contributor.authorStrauss, Benjamin
dc.date.accessioned2014-07-15T23:29:16Z
dc.date.available2014-07-15T23:29:16Z
dc.date.issued2014-04
dc.description.abstractTranslation Memory (TM) systems are one of the most widely used translation technologies. An important part of TM systems is the matching algorithm that determines what translations get retrieved from the bank of available translations to assist the human translator. Although detailed accounts of the matching algorithms used in commercial systems can’t be found in the literature, it is widely believed that edit distance algorithms are used. This paper investigates and evaluates the use of several matching algorithms, including the edit distance algorithm that is believed to be at the heart of most modern commercial TM systems. This paper presents results showing how well various matching algorithms correlate with human judgments of helpfulness (collected via crowdsourcing with Amazon’s Mechanical Turk). A new algorithm based on weighted n-gram precision that can be adjusted for translator length preferences consistently returns translations judged to be most helpful by translators for multiple domains and language pairs.en_US
dc.identifier.citationMichael Bloodgood and Benjamin Strauss. 2014. Translation memory retrieval methods. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 202-210, Gothenburg, Sweden, April. Association for Computational Linguistics.en_US
dc.identifier.urihttp://hdl.handle.net/1903/15528
dc.language.isoen_USen_US
dc.publisherAssociation for Computational Linguisticsen_US
dc.relation.isAvailableAtCenter for Advanced Study of Language
dc.relation.isAvailableAtDigitial Repository at the University of Maryland
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md)
dc.subjectcomputer scienceen_US
dc.subjectstatistical methodsen_US
dc.subjectcomputational linguisticsen_US
dc.subjectinformation retrievalen_US
dc.subjectnatural language processingen_US
dc.subjecthuman language technologyen_US
dc.subjecttranslation technologyen_US
dc.subjectcomputer-aided translationen_US
dc.subjectcomputer-assisted translationen_US
dc.subjectCAT toolsen_US
dc.subjecttranslation memory systemsen_US
dc.subjecttranslation memory retrieval methodsen_US
dc.subjectAmazon Mechanical Turken_US
dc.subjectmatching algorithmsen_US
dc.subjectfuzzy matchen_US
dc.subjectfuzzy match algorithmsen_US
dc.subjectfuzzy match scoreen_US
dc.subjectpercent matchen_US
dc.subjectweighted percent matchen_US
dc.subjectedit distanceen_US
dc.subjectn-gram precisionen_US
dc.subjectweighted n-gram precisionen_US
dc.subjectmodified weighted n-gram precisionen_US
dc.subjecttranslation match score thresholden_US
dc.subjectfuzzy match thresholden_US
dc.subjectfuzzy match score thresholden_US
dc.subjectmatch length preferencesen_US
dc.subjecttranslation match length preferencesen_US
dc.titleTranslation memory retrieval methodsen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
translationMemoryEACL2014.pdf
Size:
192.92 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.57 KB
Format:
Item-specific license agreed upon to submission
Description: