A Statistical Word-Level Translation Model for Comparable Corpora

Loading...
Thumbnail Image

Files

CS-TR-4150.ps (670.71 KB)
No. of downloads: 252
CS-TR-4150.pdf (89.43 KB)
No. of downloads: 839

Publication or External Link

Date

2000-06-17

Advisor

Citation

DRUM DOI

Abstract

In this paper, we present a model of statistical word-level mapping for comparable corpora. The approach is based on the assumption that if two terms have close distributional profiles, their corresponding translations' distributional profiles should be close in a comparable corpus. The proposed model is described. A preliminary investigation on intralanguage comparable corpora is laid out. The preliminary results are >92% accurate, suggesting the feasibility of the model. The model needs to undergo some improvements and should be tested cross linguistically before assessing its significance. (Also cross-referenced as UMIACS-TR-2000-41, LAMP-TR-048)

Notes

Rights