Handling Translation Divergences: Combining Statistical and Symbolic Techniques in Generation-Heavy Machine Translation
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
This paper describes a novel approach to handling translation divergences in a Generation-Heavy Hybrid Machine Translation (GHMT) system.The translation divergence problem is usually
reserved for
Transfer and Interlingual MT because it requires a large
combination
of complex lexical and structural mappings. A major
requirement of
these approaches is the accessibility of large amounts of
explicit
symmetrical knowledge for both source and target languages.
This
limitation renders Transfer and Interlingual approaches
ineffective in
the face of structurally-divergent language pairs with
asymmetrical
resources. GHMT addresses the more common form of this
problem, ource-poor/target-rich, by fully exploiting symbolic and
statistical target-language resources. This is accomplished
by using
target-language lexical semantics, categorial variations and
subcategorization frames to overgenerate multiple lexico-structural variations from a target-glossed syntactic dependency of the
source-language sentence. The symbolic overgeneration, which accounts for different possible translation divergences, is constrained by a statistical target-language model. (Also LAMP-TR-088) (Also UMIACS-TR-2002-49)