Stronger Inductive Biases for Sample-Efficient and Controllable Neural Machine Translation

Loading...
Thumbnail Image

Files

Publication or External Link

Date

2023

Citation

Abstract

As one of the oldest applications of natural language processing, machine translation (MT) has a growing impact on human lives both as an end application and as a key component of cross-lingual information processing such as cross-lingual information retrieval and dialogue generation. Although neural machine translation (NMT) models achieve impressive performance on some language pairs, they are trained on large amounts of human translations. In addition, they are notorious for generating fluent outputs that do not faithfully reflect the meaning of the source sentence, and they make it difficult for users to control the outputs. To address these issues, this thesis contributes techniques to build more sample-efficient and controllable NMT models by incorporating stronger inductive biases that help correct undesirable biases, integrate prior knowledge, and introduce flexible ways to control the outputs in NMT.

In our first line of research, we show that current NMT models are susceptible to undesirable biases that hinder sample-efficient training and lead to unfaithful translations. We further provide evidence that we can mitigate these undesirable biases by integrating stronger inductive biases through training algorithms. We start by introducing a new training objective to address the exposure bias problem — a common problem in sequence generation models that typically causes accumulated errors along the generated sequence at inference time, especially when the training data is limited. Next, we turn to a well-known but less studied problem in MT — the hallucination problem — translation outputs that are unrelated to the source text. To find spurious biases that cause hallucination errors, we first identify model symptoms that are indicative of hallucinations at inference time. And then, we show how these symptoms connect to the spurious biases at training time, where the model learns to predict the ground-truth translation while ignoring a large part of the source sentence. These findings provide a future path toward mitigating hallucinations by addressing these spurious biases.

In our second line of research, we study how to integrate stronger inductive biases in NMT for effective integration of the language priors estimated from unsupervised data. We introduce a novel semi-supervised learning objective with a theoretical guarantee on its global optimum and show that it can be effectively approximated and leads to improved performance in practice.

Finally, we study inductive biases in the form of NMT model architectures to allow end users to control the model outputs more easily. Controlling the outputs of standard NMT models is difficult with high computational cost at training or inference time. We develop an edit-based NMT model with novel edit operations that can incorporate users' lexical constraints with low computational cost at both training and inference time. To allow users to provide lexical constraints in more flexible morphological forms, we further introduce a modular framework for inflecting and integrating lexical constraints in NMT.

Notes

Rights