Novel Methods for Metagenomic Analysis
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
By sampling the genetic content of microbes at the nucleotide level, metagenomics
has rapidly established itself as the standard in characterizing the taxonomic diversity
and functional capacity of microbial populations throughout nature. The decreasing
cost of sequencing technologies and the simultaneous increase of throughput per run
has given scientists the ability to deeply sample highly diverse communities on a
reasonable budget. The Human Microbiome Project is representative of the flood of
sequence data that will arrive in the coming years. Despite these advancements, there
remains the significant challenge of analyzing massive metagenomic datasets to make
appropriate biological conclusions. This dissertation is a collection of novel methods
developed for improved analysis of metagenomic data: (1) We begin with Figaro, a
statistical algorithm that quickly and accurately infers and trims vector sequence from
large Sanger-based read sets without prior knowledge of the vector used in library
construction. (2) Next, we perform a rigorous evaluation of methodologies used to cluster environmental 16S rRNA sequences into species-level operational taxonomic
units, and discover that many published studies utilize highly stringent parameters,
resulting in overestimation of microbial diversity. (3) To assist in comparative
metagenomics studies, we have created Metastats, a robust statistical methodology for
comparing large-scale clinical datasets with up to thousands of subjects. Given a
collection of annotated metagenomic features (e.g. taxa, COGs, or pathways),
Metastats determines which features are differentially abundant between two
populations. (4) Finally, we report on a new methodology that employs the
generalized Lotka-Volterra model to infer microbe-microbe interactions from
longitudinal 16S rRNA data. It is our hope that these methods will enhance standard
metagenomic analysis techniques to provide better insight into the human
microbiome and microbial communities throughout our world. To assist
metagenomics researchers and those developing methods, all software described in
this thesis is open-source and available online.