Reference-guided assembly of metagenomes

dc.contributor.advisorPop, Mihaien_US
dc.contributor.authorCepeda, Victoria Pen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2021-02-14T06:32:14Z
dc.date.available2021-02-14T06:32:14Z
dc.date.issued2020en_US
dc.description.abstractMicroorganisms play an important role in all of the Earth's ecosystems, and are critical for the health of humans [1], plants, and animals. Most microbes are not easily cultured [2]; yet, Metagenomics, the analysis of organismal DNA sequences obtained directly from an environmental sample, enables the study of these microorganisms. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. The two main paradigms for this method are de novo assembly (i.e., reconstructing genomes directly from the read data), and reference-guided assembly (i.e., reconstructing genomes using closely related organisms). Because the latter paradigm has a high computational cost—due to the mapping of tens of millions of reads to thousands of full genome sequences—Metagenomic studies have primarily relied on the former paradigm. However, the increased availability of high-throughput sequencing technologies has generated thousands of bacterial genomes, making reference-guided assembly a valuable resource regardless of its computational cost. Thus, this study describes a novel metagenome assembly approach, called MetaCompass, that combines reference-guided assembly and de novo assembly, and it is organized in the following stages: (i) selecting reference genomes from a database using a metagenomic taxonomy classification software that combines gene and genome comparison methods, achieving species and strain level resolution; (ii) performing reference-guided assembly in a new manner, which uses the minimum set cover principle to remove redundancy in a metagenome read mapping while performing consensus calling; and (iii) performing de novo assembly using the reads that have not been mapped to any reference genomes. We show that MetaCompass improves the most common metrics used to evaluate assembly quality—contiguity, consistency, and reference-bases metrics—for both synthetic and real datasets such as the ones gathered in the Human Microbiome Project (HMP) [3], and it also facilitates the assembly of low abundance microorganisms retrieved with the reference-guided approach. Lastly, we used our HMP assembly results to characterize the relative advantages and limitations of de novo and reference-guided assembly approaches, thereby providing guidance on analytical strategies for characterizing the human-associated microbiota.en_US
dc.identifierhttps://doi.org/10.13016/6mnt-dtxm
dc.identifier.urihttp://hdl.handle.net/1903/26790
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pqcontrolledBioinformaticsen_US
dc.subject.pquncontrolledMetagenomic assemblyen_US
dc.subject.pquncontrolledMetagenomic classificationen_US
dc.subject.pquncontrolledMetagenomicsen_US
dc.subject.pquncontrolledMicrobiomeen_US
dc.subject.pquncontrolledReference-guided metagenomic assemblyen_US
dc.titleReference-guided assembly of metagenomesen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Cepeda_umd_0117E_21187.pdf
Size:
1.84 MB
Format:
Adobe Portable Document Format