Novel Methods for Metagenomic Analysis

dc.contributor.advisorPop, Mihaien_US
dc.contributor.authorWhite, James Roberten_US
dc.contributor.departmentApplied Mathematics and Scientific Computationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2010-07-02T05:47:55Z
dc.date.available2010-07-02T05:47:55Z
dc.date.issued2010en_US
dc.description.abstractBy sampling the genetic content of microbes at the nucleotide level, metagenomics has rapidly established itself as the standard in characterizing the taxonomic diversity and functional capacity of microbial populations throughout nature. The decreasing cost of sequencing technologies and the simultaneous increase of throughput per run has given scientists the ability to deeply sample highly diverse communities on a reasonable budget. The Human Microbiome Project is representative of the flood of sequence data that will arrive in the coming years. Despite these advancements, there remains the significant challenge of analyzing massive metagenomic datasets to make appropriate biological conclusions. This dissertation is a collection of novel methods developed for improved analysis of metagenomic data: (1) We begin with Figaro, a statistical algorithm that quickly and accurately infers and trims vector sequence from large Sanger-based read sets without prior knowledge of the vector used in library construction. (2) Next, we perform a rigorous evaluation of methodologies used to cluster environmental 16S rRNA sequences into species-level operational taxonomic units, and discover that many published studies utilize highly stringent parameters, resulting in overestimation of microbial diversity. (3) To assist in comparative metagenomics studies, we have created Metastats, a robust statistical methodology for comparing large-scale clinical datasets with up to thousands of subjects. Given a collection of annotated metagenomic features (e.g. taxa, COGs, or pathways), Metastats determines which features are differentially abundant between two populations. (4) Finally, we report on a new methodology that employs the generalized Lotka-Volterra model to infer microbe-microbe interactions from longitudinal 16S rRNA data. It is our hope that these methods will enhance standard metagenomic analysis techniques to provide better insight into the human microbiome and microbial communities throughout our world. To assist metagenomics researchers and those developing methods, all software described in this thesis is open-source and available online.en_US
dc.identifier.urihttp://hdl.handle.net/1903/10292
dc.subject.pqcontrolledBiology, Bioinformaticsen_US
dc.subject.pqcontrolledApplied Mathematicsen_US
dc.subject.pqcontrolledComputer Scienceen_US
dc.subject.pquncontrolled16S rRNAen_US
dc.subject.pquncontrolledhuman microbiomeen_US
dc.subject.pquncontrolledLotka-Volterraen_US
dc.subject.pquncontrolledmetagenomicsen_US
dc.subject.pquncontrolledoperational taxonomic unitsen_US
dc.subject.pquncontrolledvector trimmingen_US
dc.titleNovel Methods for Metagenomic Analysisen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
White_umd_0117E_11123.pdf
Size:
5.38 MB
Format:
Adobe Portable Document Format