Pig Squeal: Bridging Batch and Stream Processing Using Incremental Updates

dc.contributor.advisorAgrawala, Ashoken_US
dc.contributor.authorLampton, James Holmesen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2015-06-25T05:49:24Z
dc.date.available2015-06-25T05:49:24Z
dc.date.issued2015en_US
dc.description.abstractAs developers shift from batch MapReduce to stream processing for better latency, they are faced with the dilemma of changing tools and maintaining multiple code bases. In this work we present a method for converting arbitrary chains of MapReduce jobs into pipelined, incremental processes to be executed in a stream processing framework. Pig Squeal is an enhancement of the Pig execution framework that runs lightly modified user scripts on Storm. The contributions of this work include: an analysis that tracks how information flows through MapReduce computations along with the influence of adding and deleting data from the input, a structure to generically handle these changes along with a description of the criteria to re-enable efficiencies using combiners, case studies for running word count and the more complex NationMind algorithms within Squeal, and a performance model which examines execution times of MapReduce algorithms after converted. A general solution to the conversion of analytics from batch to streaming impacts developers with expertise in batch systems by providing a means to use their expertise in a new environment. Imagine a medical researcher who develops a model for predicting emergency situations in a hospital on historical data (in a batch system). They could apply these techniques to quickly deploy these detectors on live patient feeds. It also significantly impacts organizations with large investments in batch codes by providing a tool for rapid prototyping and significantly lowering the costs of experimenting in these new environments.en_US
dc.identifierhttps://doi.org/10.13016/M2HC9H
dc.identifier.urihttp://hdl.handle.net/1903/16507
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledbatchen_US
dc.subject.pquncontrolleddeltaen_US
dc.subject.pquncontrolledincrementalen_US
dc.subject.pquncontrolledperformanceen_US
dc.subject.pquncontrolledpigen_US
dc.subject.pquncontrolledstreamingen_US
dc.titlePig Squeal: Bridging Batch and Stream Processing Using Incremental Updatesen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lampton_umd_0117E_15998.pdf
Size:
1.38 MB
Format:
Adobe Portable Document Format