When Good MT Goes Bad: Undestanding and Mitigating Misleading Machine Translations

Loading...
Thumbnail Image

Publication or External Link

Date

2024

Citation

Abstract

Machine Translation (MT) has long been viewed as a force multiplier, enabling monolingual users to assist in processing foreign language text. In ideal situations, Neural MT (NMT) provides unprecedented MT quality, potentially increasing productivity and user acceptance of the technology. However, outside of ideal circumstances, NMT introduces new types of errors that may be difficult for users who don't understand the source language to recognize, resulting in misleading output. This dissertation seeks to understand the prevalence, nature, and impact of potentially misleading output and whether a simple intervention can mitigate its effects on monolingual users.

To understand the prevalence of misleading MT output, we conduct a study to quantify the potential impact of output that is fluent but not adequate, or ``fluently inadequate", by observing the relative frequency of these types of errors in two types of MT models, statistical and early neural models. We find that neural models were consistently more prone to this type of error than traditional statistical models. However, improving the overall quality of the MT system such as through domain adaptation reduces these errors.

We examine the nature of misleading MT output by moving from an intrinsic feature (fluency) to a more user-centered feature, believability, defined as a monolingual user's perception of the likelihood that the meaning of the MT output matches the meaning of the input, without understanding the source. We find that fluency accounts for most believability judgments, but semantic features like plausibility also play a role.

Finally, we turn to mitigating the impacts of potentially misleading NMT output. We propose two simple interventions to help users more effectively handle inadequate output: providing output from a second NMT system and providing output from a rule-based MT (RBMT) system. We test these interventions for one use case with a user study designed to mimic typical intelligence analysis triage workflows and with actual intelligence analysts as participants. We see significant increases in performance on relevance judgment tasks with output from two NMT systems and in performance on relevant entity identification tasks with the addition of RBMT output.

Notes

Rights