Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Pig Squeal: Bridging Batch and Stream Processing Using Incremental Updates

    Thumbnail
    View/Open
    Lampton_umd_0117E_15998.pdf (1.380Mb)
    No. of downloads: 603

    Date
    2015
    Author
    Lampton, James Holmes
    Advisor
    Agrawala, Ashok
    DRUM DOI
    https://doi.org/10.13016/M2HC9H
    Metadata
    Show full item record
    Abstract
    As developers shift from batch MapReduce to stream processing for better latency, they are faced with the dilemma of changing tools and maintaining multiple code bases. In this work we present a method for converting arbitrary chains of MapReduce jobs into pipelined, incremental processes to be executed in a stream processing framework. Pig Squeal is an enhancement of the Pig execution framework that runs lightly modified user scripts on Storm. The contributions of this work include: an analysis that tracks how information flows through MapReduce computations along with the influence of adding and deleting data from the input, a structure to generically handle these changes along with a description of the criteria to re-enable efficiencies using combiners, case studies for running word count and the more complex NationMind algorithms within Squeal, and a performance model which examines execution times of MapReduce algorithms after converted. A general solution to the conversion of analytics from batch to streaming impacts developers with expertise in batch systems by providing a means to use their expertise in a new environment. Imagine a medical researcher who develops a model for predicting emergency situations in a hospital on historical data (in a batch system). They could apply these techniques to quickly deploy these detectors on live patient feeds. It also significantly impacts organizations with large investments in batch codes by providing a tool for rapid prototyping and significantly lowering the costs of experimenting in these new environments.
    URI
    http://hdl.handle.net/1903/16507
    Collections
    • Computer Science Theses and Dissertations
    • UMD Theses and Dissertations

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility