DETECTING FINE-GRAINED SEMANTIC DIVERGENCES TO IMPROVE TRANSLATION UNDERSTANDING ACROSS LANGUAGES
DETECTING FINE-GRAINED SEMANTIC DIVERGENCES TO IMPROVE TRANSLATION UNDERSTANDING ACROSS LANGUAGES
Loading...
Files
(RESTRICTED ACCESS)
Publication or External Link
Date
2023
Authors
Briakou, Eleftheria
Advisor
Carpuat, Marine
Citation
Abstract
One of the core goals of Natural Language Processing (NLP) is to develop computationalrepresentations and methods to compare and contrast text meaning across languages. Such
methods are essential to many NLP tasks, such as question answering and information
retrieval. One of the limitations of those methods is the lack of sensitivity to detecting
fine-grained semantic divergences, i.e., fine-meaning differences in sentences that overlap
in content. Yet, such differences abound even in parallel texts, i.e., texts in two different
languages that are typically perceived as exact translations of each other. Detecting such
fine-grained semantic divergences across languages matters for machine translation systems,
as they yield challenging training samples and for humans, who can benefit from a nuanced
understanding of the source.
In this thesis, we focus on detecting fine-grained semantic divergences in parallel textsto improve machine and human translation understanding. In our first piece of work, we
start by providing empirical evidence that such small meaning differences exist and can
be reliably annotated both at a sentence and at a sub-sentential level. Then, we show that
they can be automatically detected by fine-tuning large pre-trained language models without
supervision by learning to rank synthetic divergences of varying granularity. In our second
piece of work, we turn to analyzing the impact of fine-grained divergences on Neural
Machine Translation (NMT) training and show that they negatively impact several aspects of
NMT outputs, e.g., translation quality and confidence. Based on these findings, we present
two orthogonal approaches to mitigating the negative impact of divergences and improve
machine translation quality: first, we introduce a divergent-aware NMT framework that
models divergences at training time; second, we present generation-based approaches for
revising divergences in mined parallel texts to make the corresponding references more
equivalent in meaning.
After exploring how subtle meaning differences in parallel texts impact machine translationsystems, we switch gears to understand how divergence detection can be used by
humans directly. In our last piece of work, we extend our divergence detection methods
to explain divergences from a human-centered perspective. We introduce a lightweight
iterative algorithm that extracts contrastive phrasal highlights, i.e., highlights of segments
indicating where divergences reside within bilingual texts, by explicitly formalizing the
alignment between them. We show that our approach produces contrastive phrasal highlights
that match human-provided rationales of divergences better than prior explainability
approaches. Finally, based on extensive application-grounded evaluations, we show that
contrastive phrasal highlights help bilingual speakers detect fine-grained meaning differences
in human-translated texts, as well as critical errors due to local mistranslations in
machine-translated texts.