Show simple item record

dc.contributor.advisorPop, Mihaien_US
dc.contributor.advisorCorrada Bravo, Héctoren_US
dc.contributor.authorPaulson, Joseph Nathanielen_US
dc.date.accessioned2015-09-18T05:46:00Z
dc.date.available2015-09-18T05:46:00Z
dc.date.issued2015en_US
dc.identifierhttps://doi.org/10.13016/M2Q63C
dc.identifier.urihttp://hdl.handle.net/1903/16996
dc.description.abstractHigh-throughput technologies such as whole targeted sequencing of marker-genes and whole metagenomic shotgun (WMS) sequencing have provided unprecedented insight into microbial communities and the interactions between their members. Statistical inference is a challenging task in analyzing these communities while accounting for a far too common limitation of metagenomic datasets: under-sampling. In this dissertation I present novel and robust methods for normalization and differential abundance testing of marker-gene surveys and whole metagenomic shotgun sequencing experiments. Using these methods I analyze one particular microbial community of interest, gut microbiota associated with diarrhea. One central problem in almost any metagenomic analysis is under-sampling of the microbial community. The analysis and interpretation of both marker-gene surveys and WMS sequencing data can bias mean and variance estimates due to the misinterpretation of zero valued counts. Even in very deep sequencing surveys, the nature of the “counting experiment” that is a metagenomic analysis can skew representative population estimates for community members. To address this issue, I characterize the biases that sparsity has on association testing of various metagenomic experiments. I developed sparsity-aware methods to 1) control for the variability in sequencing depth with a novel normalization algorithm and 2) associate gene abundance with host phenotypes. The central idea in testing associations is to weight zero values of a gene or taxa according to the posterior probability of not being observed due to under-sampling. These methods have broad general applicability in the analysis of large, relatively sparse data sets, they will provide better insight into the biological properties of complex microbial communities and their potential roles in various environmental niches. In applying these methods to ecosystems previously unexplored I was able to obtain novel insights in the microbial community of healthy and diseased children from low-income countries. I analyzed 992 children under five years of age from low-income countries, including, The Gambia, Mali, Bangladesh, and Kenya. Approximately half of the samples were from children diagnosed with moderate-to-severe diarrhea. In applying the methods developed we recovered known diarrhea-causing pathogens, including Escherichia/Shigella and Campylobacter species. We also detected previously unknown associations with disease for several bacteria including Granulicatella species and Streptococcus mitis/pneumonia groups.en_US
dc.language.isoenen_US
dc.titleNORMALIZATION AND DIFFERENTIAL ABUNDANCE ANALYSIS OF METAGENOMIC BIOMARKER-GENE SURVEYSen_US
dc.typeDissertationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.contributor.departmentApplied Mathematics and Scientific Computationen_US
dc.subject.pqcontrolledBiostatisticsen_US
dc.subject.pqcontrolledBioinformaticsen_US
dc.subject.pquncontrolledDifferential abundanceen_US
dc.subject.pquncontrolledMetagenomicsen_US
dc.subject.pquncontrolledNormalizationen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record