Show simple item record

dc.contributor.advisorKhuller, Samiren_US
dc.contributor.authorPurohit, Manishen_US
dc.date.accessioned2016-06-22T06:09:50Z
dc.date.available2016-06-22T06:09:50Z
dc.date.issued2016en_US
dc.identifierdoi:10.13016/M22F61
dc.identifier.urihttp://hdl.handle.net/1903/18353
dc.description.abstractDatacenters have emerged as the dominant form of computing infrastructure over the last two decades. The tremendous increase in the requirements of data analysis has led to a proportional increase in power consumption and datacenters are now one of the fastest growing electricity consumers in the United States. Another rising concern is the loss of throughput due to network congestion. Scheduling models that do not explicitly account for data placement may lead to a transfer of large amounts of data over the network causing unacceptable delays. In this dissertation, we study different scheduling models that are inspired by the dual objectives of minimizing energy costs and network congestion in a datacenter. As datacenters are equipped to handle peak workloads, the average server utilization in most datacenters is very low. As a result, one can achieve huge energy savings by selectively shutting down machines when demand is low. In this dissertation, we introduce the network-aware machine activation problem to find a schedule that simultaneously minimizes the number of machines necessary and the congestion incurred in the network. Our model significantly generalizes well-studied combinatorial optimization problems such as hard-capacitated hypergraph covering and is thus strongly NP-hard. As a result, we focus on finding good approximation algorithms. Data-parallel computation frameworks such as MapReduce have popularized the design of applications that require a large amount of communication between different machines. Efficient scheduling of these communication demands is essential to guarantee efficient execution of the different applications. In the second part of the thesis, we study the approximability of the co-flow scheduling problem that has been recently introduced to capture these application-level demands. Finally, we also study the question, "In what order should one process jobs?'' Often, precedence constraints specify a partial order over the set of jobs and the objective is to find suitable schedules that satisfy the partial order. However, in the presence of hard deadline constraints, it may be impossible to find a schedule that satisfies all precedence constraints. In this thesis we formalize different variants of job scheduling with soft precedence constraints and conduct the first systematic study of these problems.en_US
dc.language.isoenen_US
dc.titleData-Aware Scheduling in Datacentersen_US
dc.typeDissertationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.contributor.departmentComputer Scienceen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledApproximation algorithmsen_US
dc.subject.pquncontrolledData-aware job placementen_US
dc.subject.pquncontrolledEnergy-efficient schedulingen_US
dc.subject.pquncontrolledSoft precedence constraintsen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record