Computer Science

Permanent URI for this communityhttp://hdl.handle.net/1903/2224

Browse

Search Results

Now showing 1 - 2 of 2
  • Item
    Some Statistical and Dynamical Models for the Analysis of Mcrobial Ecosystems and their Genomic Data
    (2019) Muthiah, Senthilkumar; Corrada Bravo, Héctor; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Embedded within their genetic makeup and ecology, microbes harbor unparalleled stories on natural selection, evolution and biomedicine. In modern biology, such stories are elucidated through rigorous interrogation of microbial ecosystems with a variety of theoretic and experimental techniques. These range from abstract, isolated mathematical models to high-resolution sequencing technologies that probe every single nucleotide of a cell's DNA. It is clear that inferences thus obtained are markedly sensitive to the unforeseen technical variability introduced during an experiment, and are limited by the tractability and robustness of the models in generating sound hypotheses. We have developed statistical and computational tools to advance statistical inference for microbial genomics by overcoming a subset of technical biases, and have explored certain interesting cases of microbial interactions and their evolution by developing tractable mathematical models. Compositional bias induced by the sequencing machine. A DNA sequencing machine produces only percentage measurements (fraction molecules of a given type) of the DNA molecules in its input. When contrasting measurements from different inputs, one therefore obtains confounded inferences on absolute concentrations (molecules per unit volume). We theoretically analyze this compositional bias problem with significant generality, and exploit it to develop an empirical Bayes approach to solve it under certain assumptions with particular emphasis on microbial sequencing technologies. Suicidal attributes of prokaryotic adaptive immunity. The recently discovered CRISPR systems provide the first examples of bacterial and archaeal adaptive immune systems operating against invading viruses over ecological time scales. Equally surprising as their adaptive nature, is their ability to induce high rates of host autoimmunity. We theoretically analyze the ecological and evolutionary dynamics of such a costly defense mechanism in simplified models of prokaryote-phage coevolution. We show that by allowing for regulated post-infection activation, CRISPRs can function by exploiting a dual defense strategy of abortive infection and anti-viral resistance. Additional statistical and analytic extensions for some related questions on clustering and multi-resolution analysis also appear.
  • Item
    A comparative evaluation of sequence classification programs
    (2012-05-10) Bazinet, Adam L.; Cummings, Michael P.
    Background: A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. Results: We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. Conclusions: We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs.