Show simple item record

Use of OCR for Rapid Constrution of Bilingual Lexicons

dc.contributor.authorKaragol-Ayan, Burcuen_US
dc.contributor.authorDoermann, Daviden_US
dc.contributor.authorDorr, Bonnieen_US
dc.description.abstractThis paper describes an approach to analyzing the lexical structure of OCRed bilingual dictionaries to construct resources suited for machine translation of low-density languages, where online resources are limited. A rule-based and an HMM-based method are used for rapid construction of MT lexicons based on systematic structural clues provided in the original dictionary. We evaluate the effectiveness of our techniques, concluding that: (1) the rule-based method performs better on dictionaries with a simple structure; (2) the stochastic method performs better on dictionaries with an enriched structure; (3) regardless of the degree of dictionary richness, the rule-based method gives better results for phrasal entries than for single-word entries; and (4) Our resulting bilingual lexicons are comprehensive enough to provide reasonable MT results when compared to human-constructed lexicons. (LAMP-TR-104) (CAR-TR-986) (UMIACS-TR-2003-78)en_US
dc.format.extent388555 bytes
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-4510en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-2003-78en_US
dc.titleUse of OCR for Rapid Constrution of Bilingual Lexiconsen_US
dc.typeTechnical Reporten_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US

Files in this item


This item appears in the following Collection(s)

Show simple item record