Handling Translation Divergences: Combining Statistical and Symbolic Techniques in Generation-Heavy Machine Translation

Habash, Nizar; Dorr, Bonnie

Handling Translation Divergences: Combining Statistical and Symbolic Techniques in Generation-Heavy Machine Translation

Files

CS-TR-4369.pdf (187.4 KB)

No. of downloads: 1046

Date

2002-05-22

Authors

Habash, Nizar

Dorr, Bonnie

Abstract

This paper describes a novel approach to handling translation divergences in a Generation-Heavy Hybrid Machine Translation (GHMT) system.The translation divergence problem is usually reserved for
Transfer and Interlingual MT because it requires a large combination of complex lexical and structural mappings. A major
requirement of these approaches is the accessibility of large amounts of explicit
symmetrical knowledge for both source and target languages. This
limitation renders Transfer and Interlingual approaches ineffective in the face of structurally-divergent language pairs with
asymmetrical resources. GHMT addresses the more common form of this problem, ource-poor/target-rich, by fully exploiting symbolic and statistical target-language resources. This is accomplished by using target-language lexical semantics, categorial variations and

subcategorization frames to overgenerate multiple lexico-structural variations from a target-glossed syntactic dependency of the

source-language sentence. The symbolic overgeneration, which accounts for different possible translation divergences, is constrained by a statistical target-language model. (Also LAMP-TR-088) (Also UMIACS-TR-2002-49)

URI (handle)

http://hdl.handle.net/1903/1202

Collections

Technical Reports from UMIACS
Technical Reports of the Computer Science Department

Full item page