Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Comparative and Computational Methods for Microbial Genomics

    Thumbnail
    View/Open
    Wood_umd_0117E_15060.pdf (1.334Mb)
    No. of downloads: 435

    Date
    2014
    Author
    Wood, Derrick
    Advisor
    Salzberg, Steven L
    Pop, Mihai
    Metadata
    Show full item record
    Abstract
    Through the study of genomic sequences, researchers are able to learn much about the workings of life. As sequencing technology has improved over the past decade, the number of genomes that have been assembled has grown exponentially, and the amount of sequence generated by sequencing machines can easily number in the billions or even trillions of nucleotides for a single project. This rise in the amount of information present requires informatics approaches to correctly and efficiently analyze the data. One common approach has been to use comparative methods, which use sequence similarity to infer functional or evolutionary relationships between sequences. This dissertation uses comparative methods to improve existing records of genomic data, and introduces a novel computational approach to the problem of taxonomic sequence classification. The first part of this dissertation uses two approaches involving pairwise and multiple sequence alignments to find and correct errors in the public records of microbial genomes. Through alignment to sets of genes with known function, we show that thousands of genes have been mistakenly omitted from our public records. Our analysis of these genes shows a tendency for short genes to be omitted, and reveals that genes are more frequently omitted by organizations with less experience in annotating genomes. We also use multiple alignments of protein sequences to improve the annotation of start positions of genes, in some cases restoring hundreds of nucleotides to the genes' records. Through analysis of our results, we also found a link between a high use of rare start codons and a high rate of erroneously annotated start sites. The final part of this dissertation presents a method involving exact alignment of short sequences to perform rapid taxonomic sequence classification. By using the existing concept of minimizers to increase CPU cache utilization, we have created a tool capable of performing taxonomic classification with a sensitivity that is comparable to existing methods, a precision that surpasses all existing methods, and a speed that is over 900 times faster than the fastest existing classification approach.
    URI
    http://hdl.handle.net/1903/15260
    Collections
    • Computer Science Theses and Dissertations
    • UMD Theses and Dissertations

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility