Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 10 of 669
  • Thumbnail Image
    Item
    Automated Management of Network Slices with Service Guarantees
    (2024) Nikolaidis, Panagiotis; Baras, John; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Future mobile networks are expected to support a diverse set of applications including high-throughput video streaming, delay-sensitive augmented reality applications, and critical control traffic for autonomous driving. Unfortunately, existing networks do not have the required management mechanisms to handle this complex mix of traffic efficiently. At the same time, however, there is a significant effort from both industry and academia to make networks more open and programmable, leading to the emergence of software-defined networking, network function virtualization, and packet-forwarding programming languages. Moreover, several organisations such as the Open Networking Foundation were founded to facilitate innovation and lower the entry barriers in the mobile networking industry. In this setting, the concept of network slicing emerged which involves the partitioning of the mobile network into virtual networks that are tailored for specific applications. Each network slice needs to provide premium service to its users as specified in a service level agreement between the mobile network operator and the customer. The deployment of network slices has been largely realized thanks to network function virtualization. However, little progress has been made on mechanisms to efficiently share the network resources among them. In this dissertation, we develop such mechanisms for the licensed spectrum at the base station, a scarce resource that operators obtain through competitive auctions. We propose a system architecture composed of two new network functions; the bandwidth demand estimator and the network slice multiplexer. The bandwidth demand estimator monitors the traffic of the network slice and outputs the amount of bandwidth currently needed to deliver the desired quality of service. The network slice multiplexer decides which bandwidth demands to accept when the available bandwidth does not suffice for all the network slices. A key feature of this architecture is the separation of the demand estimation task from the contention resolution task. This separation makes the architecture scalable for a large number of network slices. It also allows the mobile network operator to charge fairly each customer based on their bandwidth demands. In contrast, the most common approach in the literature is to learn online how to split the available resources among the slices to maximize a total network utility. However, this approach is neither scalable nor suitable for service level agreements. The dissertation contributes several algorithms to realize the proposed architecture and provisioning methods to guarantee the fulfillment of the service level agreements. To satisfypacket delay requirements, we develop a bandwidth demand estimator based on queueing theory and online learning. To share resources efficiently even in the presence of traffic anomalies, we develop a network slice multiplexer based on the Max-Weight algorithm and hypothesis testing. We implement and test the proposed algorithms on network simulators and 5G testbeds to showcase their efficiency in realistic settings. Overall, we present a scalable architecture that is robust to traffic anomalies and reduces the bandwidth needed to serve multiple network slices.
  • Thumbnail Image
    Item
    Algorithmic approaches for investigating DNA Methylation in tumor evolution and heterogeneity
    (2024) Li, Xuan; Sahinalp, S. Cenk; Mount, Stephen M.; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Intratumor heterogeneity and tumor diversity of cancer impose significant challenges on the prospect of personalized cancer diagnosis, treatment, and prognostics. While many studies seek to understand the complex dynamics of cancer with theoretically well-suited biomarkers like DNA mutations, the relative molecular rigidity and sparsity of mutation make it often challenging to reconstruct reliable tumor lineage using mutation profiles in practice. Epigenetic markers like DNA methylation, on the other hand, serve as a promising alternative to elucidate intratumor heterogeneity and tumor diversity. However, systematic research leveraging algorithmic approaches to investigate DNA methylation in the context of tumor evolution and heterogeneity remains limited. Aimed to address critical gaps in computational cancer research, this dissertation presents novel computational frameworks for analyzing DNA methylation at both single-cell and bulk levels and offers insights into methylation-based tumor heterogeneity, tumor evolutionary dynamics, and cellular composition in tumor samples for characterization of the complex epigenetic landscape of tumors. Chapter 2 and Chapter 3 introduce Sgootr (Single-cell Genomic methylatiOn tumOr Tree Reconstruction), the first distance-based computational method to jointly select tumor lineage-informative CpG sites and reconstruct tumor lineages from single-cell methylation data. Sgootr lays the groundwork for understanding tumor evolution through the lens of single-cell methylation profiles. Motivated by the need highlighted in Chapter 2 to overcome imbalances in single-cell methylation data across patient samples for interpretable comparative patient analysis, Chapter 4 presents FALAFL (FAir muLti-sAmple Feature seLection). With integer linear programming (ILP) serving as its algorithmic backbone, FALAFL provides a fast and reliable solution to fairly select CpG sites across different single-cell methylation patient samples to optimally represent the entire patient cohort and identify reliable tumor lineage-informative CpG sites. Finally, Chapter 5 shifts the scope from single-cell to bulk tissue contexts and introduces Qombucha (Quadratic prOgraMming Based tUmor deConvolution with cell HierArchy), which is designed to tackle the challenges of bulk tissue analysis by inferring the methylation profiles of progenitor brain cells and determining cell type composition in bulk glioblastoma (GBM) samples. The work presented in this dissertation demonstrates the power of algorithmic and data science approaches to tackle some of the most pressing challenges in understanding the complexity of cancer epigenomics. With novel computational tools addressing current limitations in methylation data analysis, this work paves the way for further research in tumor evolution, personalized cancer treatment, and biomarker discovery. Overall, the computational frameworks and findings presented here bridge the gap between complex molecular data and clinically meaningful insights in the battle against cancer.
  • Thumbnail Image
    Item
    Developing and Measuring Latent Constructs in Text
    (2024) Hoyle, Alexander Miserlis; Resnik, Philip; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Constructs---like inflation, populism, or paranoia---are of fundamental concern to social science. Constructs are the vocabulary over which theory operates, and so a central activity is the development and measurement of latent constructs from observable data. Although the social sciences comprise fields with different epistemological norms, they share a concern for valid operationalizations that transparently map between data and measure. Economists at the US Bureau of Labor Statistics, for example, follow a hundred-page handbook to sample the egg prices that constitute the Consumer Price Index; Clinical psychologists rely on suites of psychometric tests to diagnose schizophrenia. In many fields, this observable data takes the form of language: as a social phenomenon, language data can encode many of the latent social constructs that people care about. Commensurate with both increasing sophistication in language technologies and amounts of available data, there has thus emerged a "text-as-data" paradigm aimed at "amplifying and augmenting" the analyses that compose research. At the same time, Natural Language Processing (NLP), the field from which analysis tools originate, has often remained separate from real-world problems and guiding theories---as least when it comes to social science. Instead, it focuses on atomized tasks under the assumption that progress on low-level language aspects will generalize to higher-level problems that involve overlapping elements. This dissertation focuses on NLP methods and evaluations that facilitate the development and measurement of latent constructs from natural language, while remaining sensitive to social sciences' need for interpretability and validity.
  • Thumbnail Image
    Item
    Representation Learning for Reinforcement Learning: Modeling Non-Gaussian Transition Probabilities with a Wasserstein Critic
    (2024) Tse, Ryan; Zhang, Kaiqing; Electrical Engineering; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Reinforcement learning algorithms depend on effective state representations when solving complex, high-dimensional environments. Recent methods learn state representations using auxiliary objectives that aim to capture relationships between states that are behaviorally similar, meaning states that lead to similar future outcomes under optimal policies. These methods learn explicit probabilistic state transition models and compute distributional distances between state transition probabilities as part of their measure of behavioral similarity. This thesis presents a novel extension to several of these methods that directly learns the 1-Wasserstein distance between state transition distributions by exploiting the Kantorovich-Rubenstein duality. This method eliminates parametric assumptions about the state transition probabilities while providing a smoother estimator of distributional distances. Empirical evaluation demonstrates improved sample efficiency over some of the original methods and a modest increase in computational cost per sample. The results establish that relaxing theoretical assumptions about state transition modeling leads to more flexible and robust representation learning while maintaining strong performance characteristics.x
  • Thumbnail Image
    Item
    Inertially Constrained Ruled Surfaces for Visual Odometry
    (2024) Zhu, Chenqi; Aloimonos, Yiannis; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In computer vision, camera egomotion is typically solved with visual odometry techniques that relies on feature extraction from a sequence of images and computation of the optical flow. This, however, often requires a point-to-point correspondence between two consecutive frames which can often be costly to compute and its varying accuracy greatly affects the quality of estimated motion. Attempts have been made to bypass the difficulties originated from the correspondence problem by adopting line features and fusing other sensors (event camera, IMU), many of which still heavily rely on feature detectors. If the camera observes a straight line as it moves, the image of such line is sweeping a surface, this is a ruled surface and analyzing its shapes gives information about the egomotion. This research presents a novel algorithm to estimate 3D camera egomotion from scenes represented by ruled surfaces. Constraining the egomotion with inertia measurements from an onboard IMU sensor, the dimensionality of the solution space is greatly reduced.
  • Thumbnail Image
    Item
    Parallel and External-Memory Algorithms for Large-Scale Genomic Sequence-Graph and -Index Construction
    (2024) Khan, Jamshed; Patro, Robert; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Rapid developments in the throughput and affordability of modern genome-sequencing technologies have made the generation of billions of sequencing reads from biological samples highly time- and cost-efficient. Concurrent development of efficient algorithms and tools for genome assembly have also vastly expanded the number of available assembled genome sequences. This fast-paced and sustained increase in available genomic data has resulted in situations where, in many cases, computational data analysis has become more resource-expensive than the experiments generating these data. This is particularly true since genomic data are generated once, but are typically stored indefinitely and subsequently reanalyzed many times. As such, the development of algorithmic approaches to process these data in a fundamentally scalable manner has been a key focus of modern computational genomics. This dissertation focuses on developing scalable, parallel algorithms and data structures for compressing, organizing, and indexing high-throughput and large-scale genomic data using external-memory, in ways so as to increase the scale at which existing analyses can be performed. Toward these objectives, a mathematical object called the de Bruijn graph—and the associated data structure and variants—have become a compact data representation of increasing utility across computational genomics. Genomic data are expensive in terms of space, and building a compact representation of sequences of interest while retaining relevant features is a first step in many computational genomics pipelines. The majority of this dissertation focuses on construction algorithms for such a representation—a compact lossless variant of the de Bruijn graph, used to represent sets of genomic sequences. This dissertation first presents a SotA parallel algorithm, Cuttlefish, to construct the contracted de Bruijn graph from genome references with superior performance to pre-existing methods—where we propose a novel modeling approach for each graph vertex as an individual finite-state automaton, and treat the graph as a collection of automata. Cuttlefish is applicable for a specific class of input data, namely assembled genome sequences. The dissertation then presents our subsequent work, where we extend this method to be more general and enable it take any form of input. We developed a new SotA algorithm, Cuttlefish 2—through generalization of the earlier algorithm—to also facilitate construction from sequencing read sets as input, which is fundamentally a tougher problem due to structural properties of raw sequencing data. Both Cuttlefish and Cuttlefish 2 construct the de Bruijn graph without source-annotations for the vertices. The dissertation then presents Cuttlefish 3, an even faster, more memory-frugal, and further scalable algorithm to construct the contracted de Bruijn graph, while also incorporating source-annotation information (i.e. inverted-indexing) into the graph. Orthogonal to the strategy of working with the whole graph at all times, Cuttlefish 3 adopts an established paradigm of partitioning the original graph into subgraphs, contracting the subgraphs separately, and then merging the local solutions appropriately—with novel algorithmic contributions in key steps. It speeds up the construction process of the local contractions non-trivially, proposes a novel algorithm to join the local solutions based on parallel tree contraction, and presents a graph sparsification strategy to vastly reduce the amount of vertices to be tracked when inverted-indexing the graph. Finally, the dissertation presents a simple and highly parallel algorithm, CaPS-SA, to construct the classic Suffix Array data structure to index genomic sequences, advancing the SotA parallel solutions for modern architectures.
  • Thumbnail Image
    Item
    Practical Cryptography for Blockchains: Secure Protocols with Minimal Trust
    (2024) Glaeser, Noemi; Katz, Jonathan; Malavolta, Giulio; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In 2008, Satoshi Nakamoto introduced Bitcoin, the first digital currency without a trusted authority whose security is maintained by a decentralized blockchain. Since then, a plethora of decentralized applications have been proposed utilizing blockchains as a public bulletin board. This growing popularity has put pressure on the ecosystem to prioritize scalability at the expense of trustlessness and decentralization. This work explores the role cryptography has to play in the blockchain ecosystem to improve performance while maintaining minimal trust and strong security guarantees. I discuss a new paradigm for scalability, called naysayer proofs, which sits between optimistic and zero-knowledge approaches. Next, I cover two on-chain applications: First, I show how to improve the security of a class of coin mixing protocols by giving a formal security treatment of the construction paradigm and patching the security of an existing, insecure protocol. Second, I show how to construct practical on-chain protocols for a large class ofelections and auctions which simultaneously offer fairness and non-interactivity without relying on a trusted third party. Finally, I look to the edges of the blockchain and formalize new design requirements for the problem of backing up high-value but rarely-used secret keys, such as those used to secure the reserves of a cryptocurrency exchange, and develop a protocol which efficiently meets these new challenges. All of these works will be deployed in practice or have seen interest from practitioners. These examples show that advanced cryptography has the potential to meaningfully nudge the blockchain ecosystem towards increased security and reduced trust.
  • Thumbnail Image
    Item
    Everything Efficient All at Once - Compressing Data and Deep Networks
    (2024) Girish, Sharath; Shrivastava, Abhinav; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In this thesis, we examine the efficiency of deep networks and data, both of which are widely used in various computer vision/AI applications and are ubiquitous in today's information age. As deep networks continue to grow exponentially in size, improving their efficiency in terms of size and computation becomes necessary for deploying across various mobile/small devices with hardware constraints. Data efficiency is also equivalently important due to the memory and network speed bottlenecks when transmitting and storing data which is also being created and transmitted at an exponential rate. In this work, we explore in detail, various approaches to improve the efficiency of deep networks, as well as perform compression of various forms of data content. Efficiency of deep networks involves two major aspects; size, or the memory required to store deep networks on disk, and computation, or the number of operations/time taken to execute the network. The first work analyzes sparsity for computation reduction in the context of vision tasks which involve a large pretraining stage followed by downstream task finetuning. We show that task specific sparse subnetworks are more efficient than generalized sparse subnetworks which are more dense and do not transfer very well. We analyze several behaviors of training sparse networks for various vision tasks. While efficient, this sparsity theoretically focuses on only computation reduction and requires dedicated hardware for practical deployment. We therefore develop a framework for simultaneously reducing size and computation by utilizing a latent quantization-framework along with regularization losses. We compress convolutional networks by more than an order of magnitude in size while maintaining accuracy and speeding up inference without dedicated hardware. Data can take different forms such as audio, language, image, or video. We develop approaches for improving the compression and efficiency of various forms of visual data which take up the bulk of global network traffic as well as storage. This consists of 2D images or videos and, more recently, their 3D equivalents of static/dynamic scenes which are becoming popular for immersive AR/VR applications, scene understanding, 3D-aware generative modeling, and so on. To achieve data compression, we utilize Implicit Neural Representations (INRs) which represent data signals in terms of deep network weights. We transform the problem of data compression into network compression, thereby learning efficient data representations. We first develop an algorithm for compression of 2D videos via autoregressive INRs whose weights are compressed by utilizing the latent-quantization framework. We then focus on learning a general-purpose INR which can compress different forms of data such as 2D images/videos and can potentially be extended to the audio or language domain as well. This can be extended to compression of 3D objects and scenes as well. Finally, while INRs can represent 3D information, they are slow to train and render which are important for various real-time 3D applications. We utilize 3D Gaussian Splatting (3D-GS), a form of explicit representation for 3D scenes or objects. 3D-GS is quite fast to train and render, but consume large amounts of memory and are especially inefficient for modeling dynamic scenes or 3D videos. We first develop a framework for efficiently training and compressing 3D-GS for static scenes. We achieve large reductions in storage memory, runtime memory, training and rendering time costs while maintaining high reconstruction quality. Next, we extend to dynamic scenes or 3D videos, developing an online streamable framework for 3D-GS. We learn per-frame 3D-GS and learn/transmit only the residuals for 3D-GS attributes achieving large reductions in per-frame storage memory for online streamable 3D-GS while also reducing training time costs and maintaining high rendering speeds and reconstruction quality.
  • Item
    Designing Human-Centered AI Tools with Interactive Visualization
    (2024) Hoque, Md Naimul; Kraus, Kari; Information Studies; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Human-centered AI (HCAI), rather than replacing the human, puts the user in the driver's seat of so-called human-centered AI-infused tools (HCAI tools): interactive software tools that amplify, augment, empower, and enhance human performance using AI models. In this dissertation, I discuss how interactive visualization can be a key enabling technology for creating such human-centered AI tools. Visualization has already been shown to be a fundamental component in explainable AI models, and coupling this with data-driven, semantic, and unified interaction feedback loops will enable a human-centered approach for bridging AI models and human users. To validate this approach, I at first interviewed HCI, AI, and Visualization experts to define the characteristics of HCAI tools. I then discuss the design and development process of four HCAI tools powered by visualization. I conclude by outlining the design guidelines learned from the design process of the tools and research directions for designing future HCAI tools.
  • Item
    Analytics of Configuration Management for System Administration
    (2024) Katz, Yehuda; Agrawala, Ashok K; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    System administrators are usually trusted to be experts on the systems they control, yet security breaches and performance problems can often be traced to improper configuration by these same administrators. These configuration mistakes aren’t made deliberately and certainly not maliciously, but are usually due to a lack of information about the consequences and interactions of these settings. This problem becomes more apparent as the complexity of the software being configured grows and as the role of system administrator is taken on by more people with less time to develop a complete understanding of the systems they control. We call this *Uninformed Configuration*. There is a blind spot in existing scientific research when it comes to understanding the effects of configuration changes on system performance and security, which if well understood would allow for informed configuration management. We present a new way to analyze and understand the effects of configuration management. We define a clear division between the operations of a program that are controlled by configuration and the operations of a program that are affected by the data the program is processing. This allows us to make more accurate inferences about how changing a configuration *knob* will affect the overall security and performance of the system. We build on existing static analysis tools and control flow representations originally designed for compiler optimization to build a clear picture of the effects of configuration changes. We refine the concept of understanding program execution paths with a control plane and data plane by focusing on the effects of configuration changes as a part of the control plane. We provide a method for communicating the importance of each configuration knob to a system administrator using a standardized ranking and scoring system. We also apply these methods to configuration knobs with known performance and security effects in two commonly used pieces of software. Finally, we discuss several future avenues of scientific research and practical work which will carry these ideas further to improve the state of configuration management.