Algorithms for scalable and efficient population genomics and metagenomics

Thumbnail Image


Publication or External Link





Microbes strongly impact human health and the ecosystem of which they are a part. Rapid improvements and decreasing costs in sequencing technologies have revolutionized the field of genomics and enabled important insights into microbial genome biology and microbiomes. However, new tools and approaches are needed to facilitate the efficient analysis of large sets of genomes and to associate genomic features with phenotypic characteristics better. Here, we built and utilized several tools for large-scale whole-genome analysis for different microbial characteristics, such as antimicrobial resistance and pathogenicity, that are important for human health.

Chapters 2 and 3 demonstrate the needs and challenges of population genomics in associating antimicrobial resistance with genomic features. Our results highlight important limitations of reference database-driven analysis for genotype-phenotype association studies and demonstrate the utility of whole-genome population genomics in uncovering novel genomic factors associated with antimicrobial resistance.

Chapter 4 describes PRAWNS, a fast and scalable bioinformatics tool that generates compact pan-genomic features. Existing approaches are unable to meet the needs of large-scale whole-genome analyses, either due to scalability limitations or the inability of the genomic features generated to support a thorough whole-genome assessment. We demonstrate that PRAWNS scales to thousands of genomes and provides a concise collection of genomic features which support the downstream analyses.

In Chapter 5, we assess whether the combination of long and short-read sequencing can expedite the accurate reconstruction of a pathogen genome from a microbial community. We describe the challenges for pathogen detection in current foodborne illness outbreak monitoring. Our results show that the recovery of a pathogen genome can be accelerated using a combination of long and short-read sequencing after limited culturing of the microbial community. We evaluated several popular genome assembly approaches and identified areas for improvement.

In Chapter 6, we describe SIMILE, a fast and scalable bioinformatics tool that enables the detection of genomic regions shared between several assembled metagenomes. In metagenomics, microbial communities are sequenced directly without culturing. Although metagenomics has furthered our understanding of the microbiome, comparing metagenomic samples is extremely difficult. We describe the need and challenges in comparing several metagenomic samples and present an approach that facilitates large-scale metagenomic comparisons.