UMD Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/3

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a given thesis/dissertation in DRUM.

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    Item
    STRUCTANT: A CONTEXT-AWARE TASK MANAGEMENT FRAMEWORK FOR HETEROGENEOUS COMPUTATIONAL ENVIRONMENTS
    (2019) Pachulski, Andrew J; Agrawala, Ashok; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The Internet of Things has produced a plethora of devices, systems, and networks able to produce, transmit, and process data at unprecedented rates. These data can have tremendous value for businesses, organizations, and researchers who wish to better serve an audience or understand a topic. Pipelining is a common technique used to automate the scraping, processing, transport, and analytic steps necessary for collecting and utilizing these data.Each step in a pipeline may have specific physical, virtual, and organizational processing requirements that dictate when the step can run and what machines can run it. Physical processing requirements may include hardware specific computing capabilities such as the presence of Graphics Processing Units (GPU), memory capacity, and specific CPU instruction sets. Virtual processing requirements may include job precedence, machine architecture, availability of input datasets, runtime libraries, and executable code. Organizational processing requirements may include encryption standards for data transport and data at rest, physical server security, and monetary budget constraints. Moreover, these processing requirements may have dynamic or temporal properties not known until schedule time.These processing requirements can greatly impact the ability organizations to use these data. Despite the popularity of Big Data and cloud computing and the plethora of tools they provide, organizations still face challenges when attempting to adopt these solutions. These challenges include the need to recreate the pipeline, cryptic configuration parameters, and inability to support rapid deployment and modification for data exploration. Prior work has focused on solutions that apply only to specific steps, platforms, or algorithms in the pipeline, without considering the abundance of information that describes the processing environment and operations.In this dissertation, we present Structant, a context-aware task management framework and scheduler that helps users manage complex physical, virtual, and organizational processing requirements. Structant models jobs, machines, links, and datasets by storing contextual information for each entity in the Computational Environment. Through inference of this contextual information, Structant creates mappings of jobs to resources that satisfy all relevant processing requirements. As jobs execute, Structant observes performance and creates runtime estimates for new jobs based on prior execution traces and relevant context selection. Using runtime estimates, Structant can schedule jobs with respect to dynamic and temporal processing requirements.We present results from three experiments to demonstrate how Structant can aid a user in running both simple and complex pipelines. In our first experiment, we demonstrate how Structant can schedule data collection, processing, and movement with virtual processing requirements to facilitate forward prediction of communities at risk for opioid epidemics. In our second experiment, we demonstrate how Structant can profile operations and obey temporal organizational policies to schedule data movement with fewer preemptions than two naive scheduling algorithms. In our third experiment, we demonstrate how Structant can acquire external contextual information from server room monitors and maintain regulatory compliance of the processing environment by shutting down machines according to a predetermined pipeline.
  • Thumbnail Image
    Item
    Computational Methods to Advance Phylogenomic Workflows
    (2015) Bazinet, Adam Lee; Cummings, Michael P; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Phylogenomics refers to the use of genome-scale data in phylogenetic analysis. There are several methods for acquiring genome-scale, phylogenetically-useful data from an organism that avoid sequencing the entire genome, thus reducing cost and effort, and enabling one to sequence many more individuals. In this dissertation we focus on one method in particular — RNA sequencing — and the concomitant use of assembled protein-coding transcripts in phylogeny reconstruction. Phylogenomic workflows involve tasks that are algorithmically and computationally demanding, in part due to the large amount of sequence data typically included in such analyses. This dissertation applies techniques from computer science to improve methodology and performance associated with phylogenomic workflow tasks such as sequence classification, transcript assembly, orthology determination, and phylogenetic analysis. While the majority of the methods developed in this dissertation can be applied to the analysis of diverse organismal groups, we primarily focus on the analysis of transcriptome data from Lepidoptera (moths and butterflies), generated as part of a collaboration known as “Leptree”.
  • Thumbnail Image
    Item
    Using machine learning to measure the cross section of top quark pairs in the muon+jets channel at the Compact Muon Solenoid
    (2011) Kirn, Malina Aurelia; Hadley, Nicholas; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The cross section for pp to top-antitop production at a center of mass energy of 7 TeV is measured using a data sample with integrated luminosity 36.1 inverse pb collected by the CMS detector at the LHC. The analysis is performed on a computing grid. Events with an isolated muon and three hadronic jets are analyzed using a multivariate machine learning algorithm. Kinematic variables and b tags are provided as input to the algorithm; output from the algorithm is used in a maximum likelihood fit to determine top-antitop event yield. The measured cross section is 151 +/- 15(stat.) +35/-28(syst.) +/- 6(lumi.) pb.