Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Text Summarization Evaluation: Correlating Human Performance on an Extrinsic Task with Automatic Intrinsic Metrics

    Thumbnail
    View/Open
    umi-umd-4893.pdf (666.1Kb)
    No. of downloads: 786

    Date
    2007-10-17
    Author
    Hobson, Stacy
    Advisor
    Dorr, Bonnie J
    Metadata
    Show full item record
    Abstract
    Text summarization evaluation is the process of assessing the quality of an individual summary produced by human or automatic methods. Many techniques have been proposed for text summarization and researchers require an easy and uniform method for evaluation of their summarization systems. Human evaluations are often costly, labor-intensive and time-consuming, but are known to produce the most accurate results. Automatic evaluations are fast, easy to use and reusable, but the quality of their results have not been independently shown to be similar to that of human evaluations. This thesis introduces a new human task-based summarization evaluation measure called Relevance Prediction that is a more intuitive measure of an individual's performance on a real-world task than agreement based on external judgments. Relevance Prediction parallels what a user does in the real world task of browsing a set of documents using standard search tools, i.e., the user judges relevance based on a short summary and then that same user---not an independent user---decides whether to open (and judge) the corresponding document. This measure is shown to be a more reliable measure of task performance than LDC Agreement, a current external gold-standard based measure used in the summarization evaluation community. Six experimental studies are conducted to examine the existence of correlations between the human task-based evaluations of text summarization and the output of current intrinsic automatic evaluation metrics. The experimental results indicate that moderate, yet consistent correlations exist between the Relevance-Prediction method and the ROUGE metric for single-document summarization. This work also formally establishes the usefulness of text summarization in reducing task time while maintaining a similar level of task judgment accuracy as seen with the full text documents.
    URI
    http://hdl.handle.net/1903/7623
    Collections
    • Computer Science Theses and Dissertations
    • UMD Theses and Dissertations

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility