Domain-Specific Term-List Expansion Using Existing Linguistic Resources

Loading...
Thumbnail Image

Files

CS-TR-4399.pdf (148.72 KB)
No. of downloads: 1093
CS-TR-4399.ps (399.76 KB)
No. of downloads: 212

Publication or External Link

Date

2002-10-03

Advisor

Citation

DRUM DOI

Abstract

This report describes a series of experiments involving expansion of a domain-specific human-generated "seed list" using available linguistic resources. The resources used for the expansion are intended to be general purpose: two large-scale Chinese-English dictionaries and a Chinese lexical knowledge base (HowNet). The methodology involves three steps: (1) hand extraction of head words from each entry in the human-generated seed list; (2) automatic comparison of these head words against entries in the linguistic resources-where an entry matches if the head word matches the entry exactly or is included in its the semantic definition; and (3) collection of any resulting matching entries into a larger term list. The terms extracted by this process were verified manually to confirm whether they were relevant to the topic of a specific domain. An important
contribution of this work is the finding that the use of a bilingual term list for the expansion process does not provide a significant improvement
over the use of a simpler, more easily produced, monolingual term list. (Also LAMP-TR-092) (Also UMIACS-TR-2002-79)

Notes

Rights