Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 6 of 6
  • Thumbnail Image
    Item
    Data-driven algorithms for characterizing microbial communities
    (2021) Shah, Nidhi; Pop, Mihai; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Complex microbial communities play a crucial role in environmental and human health. Traditionally, microbes have been studied by isolating and culturing them, missing organisms that cannot grow in standard laboratory conditions, and losing information about microbe-microbe interactions. With affordable high- throughput sequencing, a new field called metagenomics has emerged, that studies the genomic content of the microbial community as a whole. Metagenomics enables researchers to characterize complex microbial communities, however, many computational challenges remain with downstream analyses of large sequencing datasets. Here, we explore some fundamental problems in metagenomics and present simple algorithms and open-source software tools that implement these solutions. In the first section, we focus on using a reference database of known organisms (and genomic segments within) to taxonomically classify sequences and estimate abundances of taxa in a metagenomic sample. We developed a “BLAST outlier detection” algorithm that identifies significant outliers within database search results. We extended this method and developed ATLAS, which uses significant database hits to group sequences in the database into partitions. These partitions capture the extent of ambiguity within the classification of a sample. Besides taxonomically classifying sequences, we also explored the problem of taxonomic abundance profiling, i.e., estimating the abundance of different species in the community. We describe TIPP2, a marker gene-based abundance profiling method, which combines phylogenetic placement with statistical techniques to control classification accuracy. TIPP2 includes an updated set of reference packages and several algorithmic improvements over the original TIPP method. Next, we explore how to reconstruct genomes from metagenomic samples. Despite advances in metagenome assembly algorithms, assembling reads into complete genomes is still a computationally challenging problem because of repeated sequences within and among genomes, uneven abundances of organisms, sequencing errors, and strain-level variation. To improve upon the fragmented assemblies, researchers use a strategy called binning, which involves clustering together genomic fragments that likely originate from an individual organism. We describe Binnacle, a tool that explicitly accounts for scaffold information in binning. We describe novel algorithms for estimating the scaffold-level depth of coverage information and show that variation-aware scaffolders help further improve the completeness and quality of the resulting metagenomic bins. Finally, we explore how to organize enormous sets of sequence data generated through the surge of metagenomic studies. There have been recent efforts to organize and document genes found in microbial communities in “microbial gene catalogs”. Although gene catalogs are commonly used, they have not been critically evaluated for their effectiveness as a basis for metagenomic analyses. We investigated one such catalog and focus on both the approach used to construct this catalog and its effectiveness when used as a reference for microbiome studies. Our results highlight important limitations of the approach used to construct the catalog and call into question the broad usefulness of gene catalogs. We also recommend best practices for the construction and use of gene catalogs in microbiome studies and highlight opportunities for future research. With the increasing data being generated in different metagenomic studies, we believe our ideas, algorithms, and software tools are well-timed with the need of the community.
  • Thumbnail Image
    Item
    Efficient Algorithms for Coordinated Motion in Shared Spaces
    (2020) Dasler, Philip; Mount, David M; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The steady development of autonomous systems motivates a number of interesting algorithmic problems. These systems, such as self-driving cars, must contend with far more complex and dynamic environments than factory floor robots of the past. This dissertation seeks to identify optimization problems that are simple enough to analyze formally, yet realistic enough to contribute to the eventual design of systems extant in real-world, physical spaces. With that in mind, this work examines three problems in particular: automated vehicles and unregulated traffic crossings, a smart network for city-wide traffic flow, and an online organizational scheme for automated warehouses. First, the Traffic Crossing Problem is introduced, in which a set of n axis-aligned vehicles moving monotonically in the plane must reach their goal positions within a time limit and subject to a maximum speed limit. It is shown that this problem is NP-complete and efficient algorithms for two special cases are given. Next, motivated by a problem of computing a periodic schedule for traffic lights in an urban transportation network, the problem of Circulation with Modular Demands is introduced. A novel variant of the well-known minimum-cost circulation problem in directed networks, in this problem vertex demand values are taken from the integers modulo λ, for some integer λ≥2. This modular circulation problem is solvable in polynomial time when λ=2, but the problem is NP-hard when λ≥3. For this case, a polynomial time approximation algorithm is provided. Finally, a theoretical model for organizing portable storage units in a warehouse subject to an online sequence of access requests is proposed. Complicated by the unknown request frequencies of stored products, the warehouse must be arranged with care. Analogous to virtual-memory systems, the more popular and oft-requested an item is, the more efficient its retrieval should be. Two formulations are considered, dependent on the number of access points to which storage units must be brought, and algorithms that are O(1)-competitive with respect to an optimal algorithm are given. Additionally, in the case of a single access point, the solution herein is asymptotically optimal with respect to density.
  • Thumbnail Image
    Item
    Directed Graphs: Fixed-Parameter Tractability & Beyond
    (2014) Chitnis, Rajesh; Hajiaghayi, Mohammad; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Most interesting optimization problems on graphs are NP-hard, implying that (unless P=NP) there is no polynomial time algorithm that solves all the instances of an NP-hard problem exactly. However, classical complexity measures the running time as a function of only the overall input size. The paradigm of parameterized complexity was introduced by Downey and Fellows to allow for a more refined multivariate analysis of the running time. In parameterized complexity, each problem comes along with a secondary measure k which is called the parameter. The goal of parameterized complexity is to design efficient algorithms for NP-hard problems when the parameter k is small, even if the input size is large. Formally, we say that a parameterized problem is fixed-parameter tractable (FPT) if instances of size n and parameter k can be solved in f(k).nO(1) time, where f is a computable function which does not depend on n. A parameterized problem belongs to the class XP if instances of size n and parameter k can be solved in f(k).nO(g(k)) time, where f and g are both computable functions. In this thesis we focus on the parameterized complexity of transversal and connectivity problems on directed graphs. This research direction has been hitherto relatively unexplored: usually the directed version of the problems require significantly different and more involved ideas than the ones for the undirected version. Furthermore, for directed graphs there are no known algorithmic meta-techniques: for example, there is no known algorithmic analogue of the Graph Minor Theory of Robertson and Seymour for directed graphs. As a result, the fixed-parameter tractability status of the directed versions of several fundamental problems such as Multiway Cut, Multicut, Subset Feedback Vertex Set, Odd Cycle Transversal, etc. was open. In the first part of the thesis, we develop the framework of shadowless solutions for a general class of transversal problems in directed graphs. For this class of problems, we reduce the problem of finding a solution in FPT time to that of finding a shadowless solution. Since shadowless solutions have a good (problem-specific) structure, this provides an important first step in the design of FPT algorithms for problems on directed graphs. By understanding the structure of shadowless solutions, we are able to design the first FPT algorithms for the Directed Multiway Cut problem and the Directed Subset Feedback Vertex Set problem. In the second part of the thesis, we present tight bounds on the parameterized complexity of well-studied directed connectivity problems such as Strongly Connected Steiner Subgraph and Directed Steiner Forest when parameterized by the number of terminals/terminal pairs. We design new optimal XP algorithms for the aforementioned problems, and also prove matching lower bounds for existing XP algorithms. Most of our hardness results hold even if the underlying undirected graph is planar. Finally, we conclude with some open problems regarding the parameterized complexity of transversal and connectivity problems on directed graphs.
  • Thumbnail Image
    Item
    Search Complexities for HTN Planning
    (2013) Alford, Ronald Wayne; Nau, Dana S; Kuter, Ugur; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Hierarchical Task Network (HTN) planning is the problem of decomposing an initial task into a sequence of executable steps. Often viewed as just a way to encode human knowledge to solve classical planning problems faster, HTN planning is more expressive than classical planning, even to the point of being undecidable in the general case. However, HTN planning is not just a way to solve planning problems faster, but is itself a search problem that can benefit from its own distinct search algorithms and heuristics. The dissertation examines the complexities of various HTN planning problem classes in order to motivate the development of heuristic search algorithms for HTN planning which are guaranteed to terminate on a large class of syntactically identifiable problems, as well as domain independent heuristics for those algorithms to use. This will allow HTN planning to be used in a number of areas where the solvability of a problem is in question, including during the initial development of a domain and for use in policy generation in non-deterministic planning environments. In particular, this dissertation analyzes two commonly used algorithms for HTN planning and describes the subsets of HTN problems that these algorithms terminate on. This allows us to discuss the run-times of these algorithms and com- pare the expressivity of the classes of problems they decide. We provide two new HTN algorithms which terminate on a strictly broader and more expressive set of HTN problems. We also analyze the complexity of delete-free HTN planning, an analogue to delete-free classical planning which is the base of many classical planning heuristics. We show that delete-free HTN planning is NP-complete, putting the existence of strict-semantics delete-relaxation-based HTN heuristics out of reach for practical purposes. Finally, we provide a translation of a large subset of HTN planning to classical planning, which allows us to use a classical planner as a surrogate for a heuristic HTN planner. Our experiments show that even small amounts and incomplete amounts of HTN knowledge, when translated into PDDL using our algorithm, can greatly improve a classical planner's performance.
  • Thumbnail Image
    Item
    Learning Techniques in Multi-Armed Bandits
    (2012) Gupta, Neha; Agrawala, Ashok K; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Multi-Armed bandit problem is a classic example of the exploration vs. exploitation dilemma in which a collection of one-armed bandits, each with unknown but fixed reward probability, is given. The key idea is to develop a strategy, which results in the arm with the highest reward probability to be played such that the total reward obtained is maximized. Although seemingly a simplistic problem, solution strategies are important because of their wide applicability in a myriad of areas such as adaptive routing, resource allocation, clinical trials, and more recently in the area of online recommendation of news articles, advertisements, coupons, etc. to name a few. In this dissertation, we present different types of Bayesian Inference based bandit algorithms for Two and Multiple Armed Bandits which use Order Statistics to select the next arm to play. The Bayesian strategies, also known in literature as Thompson Method are shown to function well for a whole range of values, including very small values, outperforming UCB and other commonly used strategies. Empirical analysis results show a significant improvement on multiple datasets. In the second part of the dissertation, two types of Successive Reduction (SR) strategies - 1) Successive Reduction Hoeffding (SRH) and 2) Successive Reduction Order Statistics (SRO) are introduced. Both use an Order Statistics based Sampling method for arm selection, and then successively eliminate bandit arms from consideration depending on a confidence threshold. While SRH uses Hoeffding Bounds for elimination, SRO uses the probability of an arm being superior to the currently selected arm to measure confidence. The empirical results show that the performance advantage of proposed SRO scheme increasing persistently with the number of bandit arms while the SRH scheme shows similar performance as pure Thompson Sampling Method. In the third part of the dissertation, the assumption of the reward probability being fixed is removed. We model problems where reward probabilities are drifting , and introduce a new method called Dynamic Thompson Sampling (DTS) which adapts the reward probability estimate faster than traditional schemes and thus leads to improved performance in terms of lower regret. Our empirical results demonstrate that DTS method outperforms the state-of-the-art techniques, namely pure Thompson Sampling, UCB-Normal and UCB-f, for the case of dynamic reward probabilities. Furthermore, the performance advantage of the proposed DTS scheme increases persistently with the number of bandit arms. In the last part of the dissertation, we delve into arm space decomposition and use of multiple agents in the Bandit process. The three most important characteristics of a multi-agent systems are 1) Autonomy --- agents are completely or partially autonomous, 2) Local views --- agents are restricted to a local view of information, and 3) Decentralization of control --- each agent influences a limited part of the overall decision space. We study and compare Centralized vs. Decentralized Sampling Algorithm in Multi-Armed Bandit problems in the context of common payoff games . In the Centralized Decision Making, a central agent maintains a global view of the currently available information and makes a decision to choose the next arm just as the regular Bayesian Algorithm. In Decentralized Decision Making, each agent maintains a local view of the arms and makes decisions just based on the local information available at its end without communicating with other agents. The Decentralized Decision Making is modeled as a Game Theory problem. Our results show that the Decentralized systems perform well for both the cases of Pure as well Mixed Nash equilibria and their performance scales well with the increase in the number of arms due to reduced dimensionality of the space. We thus believe that this dissertation establishes Bayesian Multi-Armed bandit strategies as one of the prominent strategies in the field of bandits and opens up venues for new interesting research in the future.
  • Thumbnail Image
    Item
    Approximation Algorithms for Resource Allocation
    (2011) Saha, Barna; Khuller, Samir; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This thesis is devoted to designing new techniques and algorithms for combinatorial optimization problems arising in various applications of resource allocation. Resource allocation refers to a class of problems where scarce resources must be distributed among competing agents maintaining certain optimization criteria. Examples include scheduling jobs on one/multiple machines maintaining system performance; assigning advertisements to bidders, or items to people maximizing profit/social fairness; allocating servers or channels satisfying networking requirements etc. Altogether they comprise a wide variety of combinatorial optimization problems. However, a majority of these problems are NP-hard in nature and therefore, the goal herein is to develop approximation algorithms that approximate the optimal solution as best as possible in polynomial time. The thesis addresses two main directions. First, we develop several new techniques, predominantly, a new linear programming rounding methodology and a constructive aspect of a well-known probabilistic method, the Lov\'{a}sz Local Lemma (LLL). Second, we employ these techniques to applications of resource allocation obtaining substantial improvements over known results. Our research also spurs new direction of study; we introduce new models for achieving energy efficiency in scheduling and a novel framework for assigning advertisements in cellular networks. Both of these lead to a variety of interesting questions. Our linear programming rounding methodology is a significant generalization of two major rounding approaches in the theory of approximation algorithms, namely the dependent rounding and the iterative relaxation procedure. Our constructive version of LLL leads to first algorithmic results for many combinatorial problems. In addition, it settles a major open question of obtaining a constant factor approximation algorithm for the Santa Claus problem. The Santa Claus problem is a $NP$-hard resource allocation problem that received much attention in the last several years. Through out this thesis, we study a number of applications related to scheduling jobs on unrelated parallel machines, such as provisionally shutting down machines to save energy, selectively dropping outliers to improve system performance, handling machines with hard capacity bounds on the number of jobs they can process etc. Hard capacity constraints arise naturally in many other applications and often render a hitherto simple combinatorial optimization problem difficult. In this thesis, we encounter many such instances of hard capacity constraints, namely in budgeted allocation of advertisements for cellular networks, overlay network design, and in classical problems like vertex cover, set cover and k-median.