A Statistical Word-Level Translation Model for Comparable Corpora

Diab, Mona; Finch, Steve

A Statistical Word-Level Translation Model for Comparable Corpora

Files

CS-TR-4150.ps (670.71 KB)

No. of downloads: 252

CS-TR-4150.pdf (89.43 KB)

No. of downloads: 839

Date

2000-06-17

Authors

Diab, Mona

Finch, Steve

Abstract

In this paper, we present a model of statistical word-level mapping for comparable corpora. The approach is based on the assumption that if two terms have close distributional profiles, their corresponding translations' distributional profiles should be close in a comparable corpus. The proposed model is described. A preliminary investigation on intralanguage comparable corpora is laid out. The preliminary results are >92% accurate, suggesting the feasibility of the model. The model needs to undergo some improvements and should be tested cross linguistically before assessing its significance. (Also cross-referenced as UMIACS-TR-2000-41, LAMP-TR-048)

URI (handle)

http://hdl.handle.net/1903/1081

Collections

Technical Reports from UMIACS
Technical Reports of the Computer Science Department

Full item page