Novel Methods for Metagenomic Analysis
White, James Robert
MetadataShow full item record
By sampling the genetic content of microbes at the nucleotide level, metagenomics has rapidly established itself as the standard in characterizing the taxonomic diversity and functional capacity of microbial populations throughout nature. The decreasing cost of sequencing technologies and the simultaneous increase of throughput per run has given scientists the ability to deeply sample highly diverse communities on a reasonable budget. The Human Microbiome Project is representative of the flood of sequence data that will arrive in the coming years. Despite these advancements, there remains the significant challenge of analyzing massive metagenomic datasets to make appropriate biological conclusions. This dissertation is a collection of novel methods developed for improved analysis of metagenomic data: (1) We begin with Figaro, a statistical algorithm that quickly and accurately infers and trims vector sequence from large Sanger-based read sets without prior knowledge of the vector used in library construction. (2) Next, we perform a rigorous evaluation of methodologies used to cluster environmental 16S rRNA sequences into species-level operational taxonomic units, and discover that many published studies utilize highly stringent parameters, resulting in overestimation of microbial diversity. (3) To assist in comparative metagenomics studies, we have created Metastats, a robust statistical methodology for comparing large-scale clinical datasets with up to thousands of subjects. Given a collection of annotated metagenomic features (e.g. taxa, COGs, or pathways), Metastats determines which features are differentially abundant between two populations. (4) Finally, we report on a new methodology that employs the generalized Lotka-Volterra model to infer microbe-microbe interactions from longitudinal 16S rRNA data. It is our hope that these methods will enhance standard metagenomic analysis techniques to provide better insight into the human microbiome and microbial communities throughout our world. To assist metagenomics researchers and those developing methods, all software described in this thesis is open-source and available online.