Theses and Dissertations from UMD
Permanent URI for this communityhttp://hdl.handle.net/1903/2
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
Search Results
Item METHOD VALIDATION AND DEVELOPMENT FOR THE METAGENOMIC EXPLORATION OF MICROBIAL COMMUNITIES(2022) Commichaux, Seth; Pop, Mihai; Rand, Hugh; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Our world is inhabited and shaped by diverse and complex microbial communities which we are only beginning to characterize and understand. With the advent of affordable high-throughput sequencing, the study of the genomic content of microbial communities, metagenomics, has accelerated our understanding of their impact on human and environmental health. The increasing number of datasets produced by metagenomic studies provide many opportunities for novel bioinformatic analyses and for the development of computational methods. However, careful benchmarking and validation are also important undertakings to ensure the integrity of methods and research in such a rapidly developing field. Here, we explored several problems in metagenomics by benchmarking existing methods and technologies, developing new methods, recommending best practices, and highlighting opportunities for future work. First, microbial gene catalogs document and organize the genes found in microbial communities and provide a reference for the standardized analysis of metagenomic data. Although commonly used to explore the intersection between microbiomes, humans, and ecosystems, the methods used for their construction and effectiveness for metagenomic analyses had not been critically evaluated. Our analysis highlighted important limitations of gene catalogs, opportunities for future research, and allowed us to recommend best practices. Second, we assessed if nanopore long read sequencing could expedite the accurate reconstruction of a pathogen genome from a microbial community. The investigation of foodborne illness outbreaks routinely uses short-read whole genome sequencing of pure culture pathogen colonies. However, culturing is a bottleneck and short reads cannot span all bacterial genomic repeats, often leading to fragmented assemblies. Our results showed that the integration of long-read sequencing could expedite the public health response by reconstructing complete pathogen genomes from a microbial community after limited culturing. Additionally, our evaluation of state-of-the-art assembly tools identified biases and areas for improvement. Third, we describe taxaTarget, a supervised learning approach for the taxonomic classification of microeukaryotes in metagenomic data. Metagenomics has been underutilized for microeukaryotes due to the many computational challenges they present. Existing tools often implement universal sequence similarity cutoffs which ignore that sequences can evolve at different rates and, thus, have different discriminatory power. We show that a data-driven approach to determining classification thresholds can result in higher sensitivity and precision than existing tools. Fourth, we explored the use of horizontally transferred plasmids to relate an outbreak strain to the microbiome of a suspected environmental source. The investigation of the 2020 red onion outbreak recovered the outbreak strain from patients but not the farms implicated as the likely source of contamination. Our analysis identified highly similar plasmids in the outbreak strain and environmental isolates collected from the farms, which supported a connection between the outbreak strain and the implicated farms. Additionally, we highlighted the need for more detailed and accurate metadata, more extensive environmental sampling, and a better understanding of plasmid molecular evolution before such analyses can be added to the public health response.