Show simple item record

Machine Translation by Pattern Matching

dc.contributor.advisorResnik, Philip Sen_US
dc.contributor.authorLopez, Adam Daviden_US
dc.date.accessioned2008-06-20T05:34:54Z
dc.date.available2008-06-20T05:34:54Z
dc.date.issued2008-04-22en_US
dc.identifier.urihttp://hdl.handle.net/1903/8110
dc.description.abstractThe best systems for machine translation of natural language are based on statistical models learned from data. Conventional representation of a statistical translation model requires substantial offline computation and representation in main memory. Therefore, the principal bottlenecks to the amount of data we can exploit and the complexity of models we can use are available memory and CPU time, and current state of the art already pushes these limits. With data size and model complexity continually increasing, a scalable solution to this problem is central to future improvement. Callison-Burch et al. (2005) and Zhang and Vogel (2005) proposed a solution that we call "translation by pattern matching", which we bring to fruition in this dissertation. The training data itself serves as a proxy to the model; rules and parameters are computed on demand. It achieves our desiderata of minimal offline computation and compact representation, but is dependent on fast pattern matching algorithms on text. They demonstrated its application to a common model based on the translation of contiguous substrings, but leave some open problems. Among these is a question: can this approach match the performance of conventional methods despite unavoidable differences that it induces in the model? We show how to answer this question affirmatively. The main open problem we address is much harder. Many translation models are based on the translation of discontiguous substrings. The best pattern matching algorithm for these models is much too slow, taking several minutes per sentence. We develop new algorithms that reduce empirical computation time by two orders of magnitude for these models, making translation by pattern matching widely applicable. We use these algorithms to build a model that is two orders of magnitude larger than the current state of the art and substantially outperforms a strong competitor in Chinese-English translation. We show that a conventional representation of this model would be impractical. Our experiments shed light on some interesting properties of the underlying model. The dissertation also includes the most comprehensive contemporary survey of statistical machine translation.en_US
dc.format.extent1284226 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.titleMachine Translation by Pattern Matchingen_US
dc.typeDissertationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.contributor.departmentComputer Scienceen_US
dc.subject.pqcontrolledComputer Scienceen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record