Text Summarization Evaluation: Correlating Human Performance on an Extrinsic Task with Automatic Intrinsic Metrics

dc.contributor.advisorDorr, Bonnie Jen_US
dc.contributor.authorHobson, Stacyen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2008-04-22T16:02:24Z
dc.date.available2008-04-22T16:02:24Z
dc.date.issued2007-10-17en_US
dc.description.abstractText summarization evaluation is the process of assessing the quality of an individual summary produced by human or automatic methods. Many techniques have been proposed for text summarization and researchers require an easy and uniform method for evaluation of their summarization systems. Human evaluations are often costly, labor-intensive and time-consuming, but are known to produce the most accurate results. Automatic evaluations are fast, easy to use and reusable, but the quality of their results have not been independently shown to be similar to that of human evaluations. This thesis introduces a new human task-based summarization evaluation measure called Relevance Prediction that is a more intuitive measure of an individual's performance on a real-world task than agreement based on external judgments. Relevance Prediction parallels what a user does in the real world task of browsing a set of documents using standard search tools, i.e., the user judges relevance based on a short summary and then that same user---not an independent user---decides whether to open (and judge) the corresponding document. This measure is shown to be a more reliable measure of task performance than LDC Agreement, a current external gold-standard based measure used in the summarization evaluation community. Six experimental studies are conducted to examine the existence of correlations between the human task-based evaluations of text summarization and the output of current intrinsic automatic evaluation metrics. The experimental results indicate that moderate, yet consistent correlations exist between the Relevance-Prediction method and the ROUGE metric for single-document summarization. This work also formally establishes the usefulness of text summarization in reducing task time while maintaining a similar level of task judgment accuracy as seen with the full text documents.en_US
dc.format.extent682184 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/7623
dc.language.isoen_US
dc.subject.pqcontrolledComputer Scienceen_US
dc.subject.pquncontrolledText Summarization Evaluationen_US
dc.subject.pquncontrolledRelevance Predictionen_US
dc.titleText Summarization Evaluation: Correlating Human Performance on an Extrinsic Task with Automatic Intrinsic Metricsen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
umi-umd-4893.pdf
Size:
666.2 KB
Format:
Adobe Portable Document Format