Novel Methods for Metagenomic Analysis

Loading...
Thumbnail Image

Files

Publication or External Link

Date

2010

Advisor

Citation

DRUM DOI

Abstract

By sampling the genetic content of microbes at the nucleotide level, metagenomics

has rapidly established itself as the standard in characterizing the taxonomic diversity

and functional capacity of microbial populations throughout nature. The decreasing

cost of sequencing technologies and the simultaneous increase of throughput per run

has given scientists the ability to deeply sample highly diverse communities on a

reasonable budget. The Human Microbiome Project is representative of the flood of

sequence data that will arrive in the coming years. Despite these advancements, there

remains the significant challenge of analyzing massive metagenomic datasets to make

appropriate biological conclusions. This dissertation is a collection of novel methods

developed for improved analysis of metagenomic data: (1) We begin with Figaro, a

statistical algorithm that quickly and accurately infers and trims vector sequence from

large Sanger-based read sets without prior knowledge of the vector used in library

construction. (2) Next, we perform a rigorous evaluation of methodologies used to cluster environmental 16S rRNA sequences into species-level operational taxonomic

units, and discover that many published studies utilize highly stringent parameters,

resulting in overestimation of microbial diversity. (3) To assist in comparative

metagenomics studies, we have created Metastats, a robust statistical methodology for

comparing large-scale clinical datasets with up to thousands of subjects. Given a

collection of annotated metagenomic features (e.g. taxa, COGs, or pathways),

Metastats determines which features are differentially abundant between two

populations. (4) Finally, we report on a new methodology that employs the

generalized Lotka-Volterra model to infer microbe-microbe interactions from

longitudinal 16S rRNA data. It is our hope that these methods will enhance standard

metagenomic analysis techniques to provide better insight into the human

microbiome and microbial communities throughout our world. To assist

metagenomics researchers and those developing methods, all software described in

this thesis is open-source and available online.

Notes

Rights