Domain-Specific Term-List Expansion Using Existing Linguistic Resources
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
This report describes a series of experiments involving expansion of a
domain-specific human-generated "seed list" using available linguistic
resources. The resources used for the expansion are intended to be general
purpose: two large-scale Chinese-English dictionaries and a Chinese lexical
knowledge base (HowNet). The methodology involves three steps: (1) hand
extraction of head words from each entry in the human-generated seed list;
(2) automatic comparison of these head words against entries in the
linguistic resources-where an entry matches if the head word matches the
entry exactly or is included in its the semantic definition; and (3)
collection of any resulting matching entries into a larger term list. The
terms extracted by this process were verified manually to confirm whether
they were relevant to the topic of a specific domain. An important
contribution of this work is the finding that the use of a bilingual term
list for the expansion process does not provide a significant improvement
over the use of a simpler, more easily produced, monolingual term list.
(Also LAMP-TR-092)
(Also UMIACS-TR-2002-79)