Automating Performance Diagnosis in Networked Systems

dc.contributor.advisorHicks, Michael Wen_US
dc.contributor.authorMcCann, Justin N.en_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2012-10-10T11:15:34Z
dc.date.available2012-10-10T11:15:34Z
dc.date.issued2012en_US
dc.description.abstractDiagnosing performance degradation in distributed systems is a complex and difficult task. Software that performs well in one environment may be unusably slow in another, and determining the root cause is time-consuming and error-prone, even in environments in which all the data may be available. End users have an even more difficult time trying to diagnose system performance, since both software and network problems have the same symptom: a stalled application. The central thesis of this dissertation is that the source of performance stalls in a distributed system can be automatically detected and diagnosed with very limited information: the dependency graph of data flows through the system, and a few counters common to almost all data processing systems. This dissertation presents FlowDiagnoser, an automated approach for diagnosing performance stalls in networked systems. FlowDiagnoser requires as little as two bits of information per module to make a diagnosis: one to indicate whether the module is actively processing data, and one to indicate whether the module is waiting on its dependents. To support this thesis, FlowDiagnoser is implemented in two distinct environments: an individual host's networking stack, and a distributed streams processing system. In controlled experiments using real applications, FlowDiagnoser correctly diagnoses 99% of networking-related stalls due to application, connection-specific, or network-wide performance problems, with a false positive rate under 3%. The prototype system for diagnosing messaging stalls in a commercial streams processing system correctly finds 93% of message-processing stalls, with a false positive rate of 2%.en_US
dc.identifier.urihttp://hdl.handle.net/1903/12996
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolleddiagnosisen_US
dc.subject.pquncontrolleddistributed systemsen_US
dc.subject.pquncontrollednetworkingen_US
dc.subject.pquncontrolledperformanceen_US
dc.titleAutomating Performance Diagnosis in Networked Systemsen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
McCann_umd_0117E_13285.pdf
Size:
2.74 MB
Format:
Adobe Portable Document Format