Searching to Translate and Translating to Search: When Information Retrieval Meets Machine Translation

dc.contributor.advisorLin, Jimmyen_US
dc.contributor.authorTure, Ferhanen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2013-10-03T05:32:39Z
dc.date.available2013-10-03T05:32:39Z
dc.date.issued2013en_US
dc.description.abstractWith the adoption of web services in daily life, people have access to tremendous amounts of information, beyond any human's reading and comprehension capabilities. As a result, search technologies have become a fundamental tool for accessing information. Furthermore, the web contains information in multiple languages, introducing another barrier between people and information. Therefore, search technologies need to handle content written in multiple languages, which requires techniques to account for the linguistic differences. Information Retrieval (IR) is the study of search techniques, in which the task is to find material relevant to a given information need. Cross-Language Information Retrieval (CLIR) is a special case of IR when the search takes place in a multi-lingual collection. Of course, it is not helpful to retrieve content in languages the user cannot understand. Machine Translation (MT) studies the translation of text from one language into another efficiently (within a reasonable amount of time) and effectively (fluent and retaining the original meaning), which helps people understand what is being written, regardless of the source language. Putting these together, we observe that search and translation technologies are part of an important user application, calling for a better integration of search (IR) and translation (MT), since these two technologies need to work together to produce high-quality output. In this dissertation, the main goal is to build better connections between IR and MT, for which we present solutions to two problems: Searching to translate explores approximate search techniques for extracting bilingual data from multilingual Wikipedia collections to train better translation models. Translating to search explores the integration of a modern statistical MT system into the cross-language search processes. In both cases, our best-performing approach yielded improvements over strong baselines for a variety of language pairs. Finally, we propose a general architecture, in which various components of IR and MT systems can be connected together into a feedback loop, with potential improvements to both search and translation tasks. We hope that the ideas presented in this dissertation will spur more interest in the integration of search and translation technologies.en_US
dc.identifier.urihttp://hdl.handle.net/1903/14502
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pqcontrolledArtificial intelligenceen_US
dc.subject.pqcontrolledInformation scienceen_US
dc.subject.pquncontrolledcross-languageen_US
dc.subject.pquncontrolledinformation retrievalen_US
dc.subject.pquncontrolledivoryen_US
dc.subject.pquncontrolledlocality sensitive hashingen_US
dc.subject.pquncontrolledmachine translationen_US
dc.subject.pquncontrolledmapreduceen_US
dc.titleSearching to Translate and Translating to Search: When Information Retrieval Meets Machine Translationen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ture_umd_0117E_14453.pdf
Size:
1.55 MB
Format:
Adobe Portable Document Format