Citation Handling: Processing Citation Texts in Scientific Documents

dc.contributor.advisorDorr, Bonnieen_US
dc.contributor.advisorZajic, Daviden_US
dc.contributor.authorWhidby, Michael Alanen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2012-10-11T05:51:34Z
dc.date.available2012-10-11T05:51:34Z
dc.date.issued2012en_US
dc.description.abstractCitation sentences (sentences that cite other papers) play a key role in the summarization of scientific articles. However, a citation-based summarization system that depends on generic natural language processing components, such as parsers or sentence compressors, will perform poorly if those components cannot handle citations correctly. In this thesis, I examine the effect of citation handling on parsing, sentence compression, and multi-document summarization. There are two types of citations that occur in citation sentences: constituent citations and parenthetical citations. I propose an automatic citation classifier based on training data created through Mechanical Turk tasks. I demonstrate that the use of type-specific citation handling as pre-processing improves the performance of a state-of-the-art generic parser, both for quality of the parse trees and running time. Extrinsic evaluations demonstrate that improving the performance of a parser on citation sentences in turn improves the performance of a sentence compressor, Trimmer (Zajic et al., 2007), and a multi-document summarization system, MASCS, according to several summarization measures.en_US
dc.identifier.urihttp://hdl.handle.net/1903/13176
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledcitationen_US
dc.subject.pquncontrolledmulti-document summarizationen_US
dc.subject.pquncontrolledparsingen_US
dc.subject.pquncontrolledsentence compressionen_US
dc.titleCitation Handling: Processing Citation Texts in Scientific Documentsen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Whidby_umd_0117N_13447.pdf
Size:
532.17 KB
Format:
Adobe Portable Document Format