PARSING AND TAGGING OF BINLINGUAL DICTIONARY
PARSING AND TAGGING OF BINLINGUAL DICTIONARY
Files
Publication or External Link
Date
2003-09-25
Authors
MA, HUANFENG
KARAGOL-AYAN, BURCU
DOERMANN, DAVID
Advisor
Citation
DRUM DOI
Abstract
Bilingual dictionaries hold great potential as a source of lexical resources
for training and testing automated systems for optical character
recognition, machine translation, and cross-language information
retrieval. In this paper, we describe a system for extracting term
lexicons from printed bilingual dictionaries. Our work was divided into
three phases - dictionary segmentation, entry tagging, and generation. In
segmentation, pages are divided into logical entries based on structural
features learned from selected examples. The extracted entries are
associated with functional labels and passed to a tagging module which
associates linguistic labels with each word or phrase in the entry. The
output of the system is a structure that represents the entries from the
dictionary. We have used this approach to parse a variety of dictionaries
with both Latin and non-Latin alphabets, and demonstrate the results of
term lexicon generation for retrieval from a collection of French news
stories using English queries.
(LAMP-TR-106)
(CAR-TR-991)
(UMIACS-TR-2003-97)