Improving and validating computational algorithms for the assembly, clustering, and taxonomic classification of microbial communities
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Recent high-throughput sequencing technologies have advanced the study of microbial communities; nonetheless, analyzing the resulting large datasets still poses challenges. This dissertation focuses on developing and validating computational algorithms to address these challenges in microbial communities' assembly, clustering, and taxonomic classification.
We first introduce a novel reference-guided metagenomic assembly approach that delivers high-quality assemblies that generally outperform \textit{de novo} assembly in terms of quality without a significant increase in runtime. Next, We propose SCRAPT, an iterative sampling-based algorithm designed to cluster 16S rRNA gene sequences from large datasets efficiently. In addition, we validate a comprehensive set of genome assembly pipelines using Oxford Nanopore sequencing, achieving near-perfect accuracy through the combination of long and short-read polishing tools.
Our research improves the accuracy and efficiency of analyzing complex microbial communities. This dissertation offers insights into the composition and structures of these communities, with potential implications for human, animal, and plant health.