Inducing Semantic Frames from Lexical Resources

dc.contributor.advisorDorr, Bonnie Jen_US
dc.contributor.authorGreen, Rebecca Joyceen_US
dc.contributor.departmentComputer Scienceen_US
dc.date.accessioned2004-05-31T20:08:46Z
dc.date.available2004-05-31T20:08:46Z
dc.date.issued2004-02-17en_US
dc.description.abstractThe multiple ways in which propositional content can be expressed is often referred to as the paraphrase problem. This phenomenon creates challenges for such applications as information retrieval, information extraction, text summarization, and machine translation: Natural language understanding needs to recognize what remains constant across paraphrases, while natural language generation needs the ability to express content in various ways. Frame semantics is a theory of language understanding that addresses the paraphrase problem by providing slot-and-filler templates to represent frequently occurring, structured experiences. This dissertation introduces SemFrame, a system that induces semantic frames automatically from lexical resources (WordNet and the Longman Dictionary of Contemporary English [LDOCE]). Prior to SemFrame, semantic frames had been developed only by hand. In SemFrame, frames are first identified by enumerating groups of verb senses that evoke a common frame. This is done by combining evidence about pairs of semantically related verbs, based on LDOCE's subject field codes, words used in LDOCE definitions and WordNet glosses, WordNet's array of semantic relationships, etc. Pairs are gathered into larger groupings, deemed to correspond to semantic frames. Nouns associated with the verbs evoking a frame are then analyzed against WordNet's semantic network to identify nodes corresponding to frame slots. SemFrame is evaluated in two ways: (1) Compared against the handcrafted FrameNet, SemFrame achieves its best recall-precision balance with 83.2% recall (based on SemFrame's coverage of FrameNet frames) and 73.8% precision (based on SemFrame verbs' semantic relatedness to other frame-evoking verbs). A WordNet-hierarchy-based lower bound achieves 52.8% recall and 46.6% precision. (2) A frame-semantic-enhanced version of Hearst's TextTiling algorithm, applied to detecting boundaries between consecutive documents, improves upon the non-enhanced TextTiling algorithm at statistically significant levels. (Previous enhancement of the text segmentation algorithm with thesaural relationships had degraded performance.)en_US
dc.format.extent10667835 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/193
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.subject.pqcontrolledComputer Scienceen_US
dc.subject.pqcontrolledLanguage, Linguisticsen_US
dc.subject.pqcontrolledArtificial Intelligenceen_US
dc.titleInducing Semantic Frames from Lexical Resourcesen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
dissertation.pdf
Size:
10.17 MB
Format:
Adobe Portable Document Format