UMD Theses and Dissertations
Permanent URI for this collectionhttp://hdl.handle.net/1903/3
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a given thesis/dissertation in DRUM.
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
Search Results
Item Planetary Scale Data Storage(2018) Bengfort, Benjamin; Keleher, Peter J; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The success of virtualization and container-based application deployment has fundamentally changed computing infrastructure from dedicated hardware provisioning to on-demand, shared clouds of computational resources. One of the most interesting effects of this shift is the opportunity to localize applications in multiple geographies and support mobile users around the globe. With relatively few steps, an application and its data systems can be deployed and scaled across continents and oceans, leveraging the existing data centers of much larger cloud providers. The novelty and ease of a global computing context means that we are closer to the advent of an Oceanstore, an Internet-like revolution in personalized, persistent data that securely travels with its users. At a global scale, however, data systems suffer from physical limitations that significantly impact its consistency and performance. Even with modern telecommunications technology, the latency in communication from Brazil to Japan results in noticeable synchronization delays that violate user expectations. Moreover, the required scale of such systems means that failure is routine. To address these issues, we explore consistency in the implementation of distributed logs, key/value databases and file systems that are replicated across wide areas. At the core of our system is hierarchical consensus, a geographically-distributed consensus algorithm that provides strong consistency, fault tolerance, durability, and adaptability to varying user access patterns. Using hierarchical consensus as a backbone, we further extend our system from data centers to edge regions using federated consistency, an adaptive consistency model that gives satellite replicas high availability at a stronger global consistency than existing weak consistency models. In a deployment of 105 replicas in 15 geographic regions across 5 continents, we show that our implementation provides high throughput, strong consistency, and resiliency in the face of failure. From our experimental validation, we conclude that planetary-scale data storage systems can be implemented algorithmically without sacrificing consistency or performance.Item ESSAYS IN STATISTICAL ANALYSIS: ISOTONIC REGRESSION AND FILTERING(2018) Xue, Jinhang; Ryzhov, Ilya O; Smith, Paul J; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In many real-world applications in optimal information collection and stochastic approximation, statistical estimators are often constructed to learn the true parameter value of some utility functions or underlying signals. Many of these estimators exhibit excellent empirical performance, but full analyses of their consistency are not previously available, thus putting decision-makers in somewhat of a predicament regarding implementation. The goal of this dissertation is to fill this blank of missing consistency proofs. The first part of this thesis considers the consistency of estimating a monotonic cost function which appears in an optimal learning algorithm that incorporates isotonic regression with a Bayesian policy known as Knowledge Gradient with Discrete Priors (KGDP). Isotonic regression deals with regression problems under order constraints. Previous literature proposed to estimate the cost function by a weighted sum of a pool of candidate curves, each of which is generated by the isotonic regression estimator based on all the previous observations that have been collected, and the weights are calculated by KGDP. Our primary objective is to establish the consistency of the suggested estimator. Some minor results, regarding with the knowledge gradient algorithm and the isotonic regression estimator under insufficient observations, are also discussed. The second part of this thesis focuses on the convergence of the bias-adjusted Kalman filter (BAKF). The BAKF algorithm is designed to optimize the statistical estimation of a non-stationary signal that can only be observed with stochastic noise. The algorithm has numerous applications in dynamic programming and signal processing. However, a consistency analysis of the process that approximates the underlying signal has heretofore not been available. We resolve this open issue by showing that the BAKF stepsize satisfies the well-known conditions on almost sure convergence of a stochastic approximation sequence, with only one additional assumption on the convergence rate of the signal compared to those used in the derivation of the original problem.Item DECOUPLING CONSISTENCY DETERMINATION AND TRUST FROM THE UNDERLYING DISTRIBUTED DATA STORES(2018) Lekakis, Vasileios; Keleher, Peter J; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Building applications on cloud services is cost-effective and allows for rapid development and release cycles. However, relying on cloud services can severely limit applications’ ability to control their own consistency policies, and their ability to control data visibility during replication. To understand the tension between strong consistency and security guarantees on one hand and high availability, flexible replication, and performance on the other, it helps to consider two questions. First, is it possible for an application to achieve stricter consistency guarantees than what the cloud provider offers? If we solely rely on the provider service interface, the answer is no. However, if we allow the applications to determine the implementation and the execution of the consistency protocols, then we can achieve much more. The second question is, can an application relay updates over untrusted replicas without revealing sensitive information while maintaining the desired consistency guarantees? Simply encrypting the data is not enough. Encryption does not eliminate information leakage that comes from the meta-data needed for the execution of any consistency protocol. The alternative to encryption—allowing the flow of updates only through trusted replicas— leads to predefined communication patterns. This approach is prone to failures that can cause partitioning in the system. One way to answer “yes” to this question is to allow trust relationships, defined at the application level, to guide the synchronization protocol. My goal in this thesis is to build systems that take advantage of the performance, scalability, and availability of the cloud storage services while, at the same time, bypassing the limitations imposed by cloud service providers’ design choices. The key to achieving this is pushing application-specific decisions where they belong: the application. I defend the following thesis statement: By decoupling consistency determination and trust from the underlying distributed data store, it is possible to (1) support application-specific consistency guarantees; (2) allow for topology independent replication protocols that do not compromise application privacy. First I design and implement Shell, a system architecture for supporting strict consistency guarantees over eventually consistent data stores. Shell is a software layer designed to isolate consistency implementations and cloud-provider APIs from the application code. Shell consists of four internal modules and an application store, which together abstract consistency-related operations and encapsulate communication with the underlying storage layers. Apart from consistency protocols tailored to application needs, Shell provides application-aware conflict resolution without relying on generic heuristics such as the “last write wins.” Shell does not require the application to maintain dependency-tracking in- formation for the execution of the consistency protocols as other existing approaches do. I experimentally evaluate Shell over two different data-stores using real-application traces. I found that using Shell can reduce the inconsistent updates by 10%. I also measure and show the overheads that come from introducing the Shell layer. Second, I design and implement T.Rex, a system for supporting topology-independent replication without the assumption of trust between all the participating replicas. T.Rex uses role-based access control to enable flexible and secure sharing among users with widely varying collaboration types: both users and data items are assigned roles, and a user can access data only if it shares at least one role. Building on top of this abstraction, T.Rex includes several novel mechanisms: I introduce role proofs to prove role membership to others in the role without leaking information to those not in the role. Additionally, I introduce role coherence to prevent updates from leaking across roles. Finally, I use Bloom filters as opaque digests to enable querying of remote cache state without being able to enumerate it. I combine these mechanisms to develop a novel, cryptographically secure, and efficient anti-entropy protocol, T.Rex-Sync. I evaluate T.Rex on a local test-bed, and I show that it achieves security with modest computational and storage overheads.