PARSING AND TAGGING OF BINLINGUAL DICTIONARY

dc.contributor.authorMA, HUANFENGen_US
dc.contributor.authorKARAGOL-AYAN, BURCUen_US
dc.contributor.authorDOERMANN, DAVIDen_US
dc.date.accessioned2004-05-31T23:32:36Z
dc.date.available2004-05-31T23:32:36Z
dc.date.created2003-09en_US
dc.date.issued2003-09-25en_US
dc.description.abstractBilingual dictionaries hold great potential as a source of lexical resources for training and testing automated systems for optical character recognition, machine translation, and cross-language information retrieval. In this paper, we describe a system for extracting term lexicons from printed bilingual dictionaries. Our work was divided into three phases - dictionary segmentation, entry tagging, and generation. In segmentation, pages are divided into logical entries based on structural features learned from selected examples. The extracted entries are associated with functional labels and passed to a tagging module which associates linguistic labels with each word or phrase in the entry. The output of the system is a structure that represents the entries from the dictionary. We have used this approach to parse a variety of dictionaries with both Latin and non-Latin alphabets, and demonstrate the results of term lexicon generation for retrieval from a collection of French news stories using English queries. (LAMP-TR-106) (CAR-TR-991) (UMIACS-TR-2003-97)en_US
dc.format.extent912783 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/1314
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-4529en_US
dc.relation.ispartofseriesLAMP-TR-106en_US
dc.relation.ispartofseriesCAR-TR-991en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-2003-97en_US
dc.titlePARSING AND TAGGING OF BINLINGUAL DICTIONARYen_US
dc.typeTechnical Reporten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CS-TR-4529.pdf
Size:
891.39 KB
Format:
Adobe Portable Document Format