Use of OCR for Rapid Constrution of Bilingual Lexicons
Use of OCR for Rapid Constrution of Bilingual Lexicons
Files
Publication or External Link
Date
2003-09-25
Authors
Karagol-Ayan, Burcu
Doermann, David
Dorr, Bonnie
Advisor
Citation
DRUM DOI
Abstract
This paper describes an approach to analyzing the lexical structure of OCRed
bilingual dictionaries to construct resources suited for machine translation
of low-density languages, where online resources are limited. A rule-based
and an HMM-based method are used for rapid construction of MT lexicons based
on systematic structural clues provided in the original dictionary. We
evaluate the effectiveness of our techniques, concluding that: (1) the
rule-based method performs better on dictionaries with a simple structure;
(2) the stochastic method performs better on dictionaries with an enriched
structure; (3) regardless of the degree of dictionary richness, the
rule-based method gives better results for phrasal entries than for
single-word entries; and (4) Our resulting bilingual lexicons are
comprehensive enough to provide reasonable MT results when compared to
human-constructed lexicons.
(LAMP-TR-104)
(CAR-TR-986)
(UMIACS-TR-2003-78)