Structured local exponential models for machine translation

Subotin, Michael

Structured local exponential models for machine translation

dc.contributor.advisor	Resnik, Philip	en_US
dc.contributor.author	Subotin, Michael	en_US
dc.contributor.department	Linguistics	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2011-12-01T06:30:35Z
dc.date.available	2011-12-01T06:30:35Z
dc.date.issued	2010	en_US
dc.description.abstract	This thesis proposes a synthesis and generalization of local exponential translation models, the subclass of feature-rich translation models which associate probability distributions with individual rewrite rules used by the translation system, such as synchronous context-free rules, or with other individual aspects of translation hypotheses such as word pairs or reordering events. Unlike other authors we use these estimates to replace the traditional phrase models and lexical scores, rather than in addition to them, thereby demonstrating that the local exponential phrase models can be regarded as a generalization of standard methods not only in theoretical but also in practical terms. We further introduce a form of local translation models that combine features associated with surface forms of rules and features associated with less specific representation -- including those based on lemmas, inflections, and reordering patterns -- such that surface-form estimates are recovered as a special case of the model. Crucially, the proposed approach allows estimation of parameters for the latter type of features from training sets that include multiple source phrases, thereby overcoming an important training set fragmentation problem which hampers previously proposed local translation models. These proposals are experimentally validated. Conditioning all phrase-based probabilities in a hierarchical phrase-based system on source-side contextual information produces significant performance improvements. Extending the contextually-sensitive estimates with features modeling source-side morphology and reordering patterns yields consistent additional improvements, while further experiments show significant improvements obtained from modeling observed and unobserved inflections for a morphologically rich target language.	en_US
dc.identifier.uri	http://hdl.handle.net/1903/12150
dc.subject.pqcontrolled	Computer science	en_US
dc.subject.pqcontrolled	Linguistics	en_US
dc.subject.pqcontrolled	Artificial intelligence	en_US
dc.subject.pquncontrolled	Exponential models	en_US
dc.subject.pquncontrolled	Machine learning	en_US
dc.subject.pquncontrolled	Machine translation	en_US
dc.subject.pquncontrolled	Maximum entropy	en_US
dc.title	Structured local exponential models for machine translation	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Subotin_umd_0117N_11851.pdf
Size:: 321.7 KB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Linguistics Theses and Dissertations