Multiple Alternative Sentene Compressions as a Tool for Automatic Summarization Tasks

Zajic, David Michael

Multiple Alternative Sentene Compressions as a Tool for Automatic Summarization Tasks

dc.contributor.advisor	Dorr, Bonnie J.	en_US
dc.contributor.advisor	Lin, Jimmy	en_US
dc.contributor.author	Zajic, David Michael	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2007-06-22T05:32:31Z
dc.date.available	2007-06-22T05:32:31Z
dc.date.issued	2007-04-05
dc.description.abstract	Automatic summarization is the distillation of important information from a source into an abridged form for a particular user or task. Many current systems summarize texts by selecting sentences with important content. The limitation of extraction at the sentence level is that highly relevant sentences may also contain non-relevant and redundant content. This thesis presents a novel framework for text summarization that addresses the limitations of sentence-level extraction. Under this framework text summarization is performed by generating Multiple Alternative Sentence Compressions (MASC) as candidate summary components and using weighted features of the candidates to construct summaries from them. Sentence compression is the rewriting of a sentence in a shorter form. This framework provides an environment in which hypotheses about summarization techniques can be tested. Three approaches to sentence compression were developed under this framework. The first approach, HMM Hedge, uses the Noisy Channel Model to calculate the most likely compressions of a sentence. The second approach, Trimmer, uses syntactic trimming rules that are linguistically motivated by Headlinese, a form of compressed English associated with newspaper headlines. The third approach, Topiary, is a combination of fluent text with topic terms. The MASC framework for automatic text summarization has been applied to the tasks of headline generation and multi-document summarization, and has been used for initial work in summarization of novel genres and applications, including broadcast news, email threads, cross-language, and structured queries. The framework supports combinations of component techniques, fostering collaboration between development teams. Three results will be demonstrated under the MASC framework. The first is that an extractive summarization system can produce better summaries by automatically selecting from a pool of compressed sentence candidates than by automatically selecting from unaltered source sentences. The second result is that sentence selectors can construct better summaries from pools of compressed candidates when they make use of larger candidate feature sets. The third result is that for the task of Headline Generation, a combination of topic terms and compressed sentences performs better then either approach alone. Experimental evidence supports all three results.	en_US
dc.format.extent	1057999 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/1903/6729
dc.language.iso	en_US
dc.subject.pqcontrolled	Computer Science	en_US
dc.subject.pquncontrolled	Automatic Summarization	en_US
dc.subject.pquncontrolled	Sentence Compression	en_US
dc.subject.pquncontrolled	Natural Language Processing	en_US
dc.subject.pquncontrolled	Human Language Technology	en_US
dc.title	Multiple Alternative Sentene Compressions as a Tool for Automatic Summarization Tasks	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: umi-umd-4205.pdf
Size:: 1.01 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations