Handling Translation Divergences in Generation-Heavy Hybrid Machine Translation

Thumbnail Image
Files
CS-TR-4341.ps(191.5 KB)
No. of downloads: 223
CS-TR-4341.pdf(205.78 KB)
No. of downloads: 671
Publication or External Link
Date
2002-04-04
Authors
Habash, Nizar
Dorr, Bonnie
Advisor
Citation
DRUM DOI
Abstract
This paper describes a novel approach for handling translation divergences in a Generation-Heavy Hybrid Machine Translation (GHMT) system. The approach depends on the existence of rich target language resources such as word lexical semantics, including information about categorial variations and subcategorization frames. These resources are used to generate multiple structural variations from a target-glossed lexico-syntactic representation of the source language sentence. The multiple structural variations account for different translation divergences. The overgeneration of the approach is constrained by a target-language model using corpus-based statistics. The exploitation of target language resources (symbolic and statistical) to handle a problem usually reserved to Transfer and Interlingual MT is useful for translation from structurally divergent source languages with scarce linguistic resources. A preliminary evaluation on the application of this approach to Spanish-English MT proves this approach extremely promising. The approach however is not limited to MT as it can be extended to monolingual NLG applications such as summarization. Also UMIACS-TR-2002-23 Also LAMP-TR-083
Notes
Rights