Text Summarization Evaluation: Correlating Human Performance on an Extrinsic Task with Automatic Intrinsic Metrics

Hobson, Stacy

Text Summarization Evaluation: Correlating Human Performance on an Extrinsic Task with Automatic Intrinsic Metrics

dc.contributor.advisor	Dorr, Bonnie J	en_US
dc.contributor.author	Hobson, Stacy	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2008-04-22T16:02:24Z
dc.date.available	2008-04-22T16:02:24Z
dc.date.issued	2007-10-17	en_US
dc.description.abstract	Text summarization evaluation is the process of assessing the quality of an individual summary produced by human or automatic methods. Many techniques have been proposed for text summarization and researchers require an easy and uniform method for evaluation of their summarization systems. Human evaluations are often costly, labor-intensive and time-consuming, but are known to produce the most accurate results. Automatic evaluations are fast, easy to use and reusable, but the quality of their results have not been independently shown to be similar to that of human evaluations. This thesis introduces a new human task-based summarization evaluation measure called Relevance Prediction that is a more intuitive measure of an individual's performance on a real-world task than agreement based on external judgments. Relevance Prediction parallels what a user does in the real world task of browsing a set of documents using standard search tools, i.e., the user judges relevance based on a short summary and then that same user---not an independent user---decides whether to open (and judge) the corresponding document. This measure is shown to be a more reliable measure of task performance than LDC Agreement, a current external gold-standard based measure used in the summarization evaluation community. Six experimental studies are conducted to examine the existence of correlations between the human task-based evaluations of text summarization and the output of current intrinsic automatic evaluation metrics. The experimental results indicate that moderate, yet consistent correlations exist between the Relevance-Prediction method and the ROUGE metric for single-document summarization. This work also formally establishes the usefulness of text summarization in reducing task time while maintaining a similar level of task judgment accuracy as seen with the full text documents.	en_US
dc.format.extent	682184 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/1903/7623
dc.language.iso	en_US
dc.subject.pqcontrolled	Computer Science	en_US
dc.subject.pquncontrolled	Text Summarization Evaluation	en_US
dc.subject.pquncontrolled	Relevance Prediction	en_US
dc.title	Text Summarization Evaluation: Correlating Human Performance on an Extrinsic Task with Automatic Intrinsic Metrics	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: umi-umd-4893.pdf
Size:: 666.2 KB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations