Use of OCR for Rapid Constrution of Bilingual Lexicons

Thumbnail Image

Files

CS-TR-4510.pdf (379.45 KB)
No. of downloads: 514

Publication or External Link

Date

2003-09-25

Advisor

Citation

DRUM DOI

Abstract

This paper describes an approach to analyzing the lexical structure of OCRed bilingual dictionaries to construct resources suited for machine translation of low-density languages, where online resources are limited. A rule-based and an HMM-based method are used for rapid construction of MT lexicons based on systematic structural clues provided in the original dictionary. We evaluate the effectiveness of our techniques, concluding that: (1) the rule-based method performs better on dictionaries with a simple structure; (2) the stochastic method performs better on dictionaries with an enriched structure; (3) regardless of the degree of dictionary richness, the rule-based method gives better results for phrasal entries than for single-word entries; and (4) Our resulting bilingual lexicons are comprehensive enough to provide reasonable MT results when compared to human-constructed lexicons. (LAMP-TR-104) (CAR-TR-986) (UMIACS-TR-2003-78)

Notes

Rights