DETECTING FINE-GRAINED SEMANTIC DIVERGENCES TO IMPROVE TRANSLATION UNDERSTANDING ACROSS LANGUAGES

dc.contributor.advisorCarpuat, Marineen_US
dc.contributor.authorBriakou, Eleftheriaen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2023-10-07T05:36:47Z
dc.date.available2023-10-07T05:36:47Z
dc.date.issued2023en_US
dc.description.abstractOne of the core goals of Natural Language Processing (NLP) is to develop computationalrepresentations and methods to compare and contrast text meaning across languages. Such methods are essential to many NLP tasks, such as question answering and information retrieval. One of the limitations of those methods is the lack of sensitivity to detecting fine-grained semantic divergences, i.e., fine-meaning differences in sentences that overlap in content. Yet, such differences abound even in parallel texts, i.e., texts in two different languages that are typically perceived as exact translations of each other. Detecting such fine-grained semantic divergences across languages matters for machine translation systems, as they yield challenging training samples and for humans, who can benefit from a nuanced understanding of the source. In this thesis, we focus on detecting fine-grained semantic divergences in parallel textsto improve machine and human translation understanding. In our first piece of work, we start by providing empirical evidence that such small meaning differences exist and can be reliably annotated both at a sentence and at a sub-sentential level. Then, we show that they can be automatically detected by fine-tuning large pre-trained language models without supervision by learning to rank synthetic divergences of varying granularity. In our second piece of work, we turn to analyzing the impact of fine-grained divergences on Neural Machine Translation (NMT) training and show that they negatively impact several aspects of NMT outputs, e.g., translation quality and confidence. Based on these findings, we present two orthogonal approaches to mitigating the negative impact of divergences and improve machine translation quality: first, we introduce a divergent-aware NMT framework that models divergences at training time; second, we present generation-based approaches for revising divergences in mined parallel texts to make the corresponding references more equivalent in meaning. After exploring how subtle meaning differences in parallel texts impact machine translationsystems, we switch gears to understand how divergence detection can be used by humans directly. In our last piece of work, we extend our divergence detection methods to explain divergences from a human-centered perspective. We introduce a lightweight iterative algorithm that extracts contrastive phrasal highlights, i.e., highlights of segments indicating where divergences reside within bilingual texts, by explicitly formalizing the alignment between them. We show that our approach produces contrastive phrasal highlights that match human-provided rationales of divergences better than prior explainability approaches. Finally, based on extensive application-grounded evaluations, we show that contrastive phrasal highlights help bilingual speakers detect fine-grained meaning differences in human-translated texts, as well as critical errors due to local mistranslations in machine-translated texts.en_US
dc.identifierhttps://doi.org/10.13016/dspace/pngn-xs8m
dc.identifier.urihttp://hdl.handle.net/1903/30837
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledcross-lingual semanticsen_US
dc.subject.pquncontrolledmachine translationen_US
dc.subject.pquncontrollednatural language processingen_US
dc.titleDETECTING FINE-GRAINED SEMANTIC DIVERGENCES TO IMPROVE TRANSLATION UNDERSTANDING ACROSS LANGUAGESen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Briakou_umd_0117E_23667.pdf
Size:
6.77 MB
Format:
Adobe Portable Document Format