Use of OCR for Rapid Constrution of Bilingual Lexicons
dc.contributor.author | Karagol-Ayan, Burcu | en_US |
dc.contributor.author | Doermann, David | en_US |
dc.contributor.author | Dorr, Bonnie | en_US |
dc.date.accessioned | 2004-05-31T23:31:05Z | |
dc.date.available | 2004-05-31T23:31:05Z | |
dc.date.created | 2003-07 | en_US |
dc.date.issued | 2003-09-25 | en_US |
dc.description.abstract | This paper describes an approach to analyzing the lexical structure of OCRed bilingual dictionaries to construct resources suited for machine translation of low-density languages, where online resources are limited. A rule-based and an HMM-based method are used for rapid construction of MT lexicons based on systematic structural clues provided in the original dictionary. We evaluate the effectiveness of our techniques, concluding that: (1) the rule-based method performs better on dictionaries with a simple structure; (2) the stochastic method performs better on dictionaries with an enriched structure; (3) regardless of the degree of dictionary richness, the rule-based method gives better results for phrasal entries than for single-word entries; and (4) Our resulting bilingual lexicons are comprehensive enough to provide reasonable MT results when compared to human-constructed lexicons. (LAMP-TR-104) (CAR-TR-986) (UMIACS-TR-2003-78) | en_US |
dc.format.extent | 388555 bytes | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | http://hdl.handle.net/1903/1302 | |
dc.language.iso | en_US | |
dc.relation.isAvailableAt | Digital Repository at the University of Maryland | en_US |
dc.relation.isAvailableAt | University of Maryland (College Park, Md.) | en_US |
dc.relation.isAvailableAt | Tech Reports in Computer Science and Engineering | en_US |
dc.relation.isAvailableAt | UMIACS Technical Reports | en_US |
dc.relation.ispartofseries | UM Computer Science Department; CS-TR-4510 | en_US |
dc.relation.ispartofseries | LAMP-TR-104 | en_US |
dc.relation.ispartofseries | CAR-TR-986 | en_US |
dc.relation.ispartofseries | UMIACS; UMIACS-TR-2003-78 | en_US |
dc.title | Use of OCR for Rapid Constrution of Bilingual Lexicons | en_US |
dc.type | Technical Report | en_US |
Files
Original bundle
1 - 1 of 1