Automated ensemble assembly and validation of microbial genomes

dc.contributor.authorKoren, Sergey
dc.contributor.authorTreangen, Todd J
dc.contributor.authorHill, Christopher M
dc.contributor.authorPop, Mihai
dc.contributor.authorPhillippy, Adam M
dc.date.accessioned2021-09-17T15:02:20Z
dc.date.available2021-09-17T15:02:20Z
dc.date.issued2014-05-03
dc.description.abstractThe continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.en_US
dc.description.urihttps://doi.org/10.1186/1471-2105-15-126
dc.identifierhttps://doi.org/10.13016/4pfi-yvf6
dc.identifier.citationKoren, S., Treangen, T.J., Hill, C.M. et al. Automated ensemble assembly and validation of microbial genomes. BMC Bioinformatics 15, 126 (2014).en_US
dc.identifier.urihttp://hdl.handle.net/1903/27875
dc.language.isoen_USen_US
dc.publisherSpringer Natureen_US
dc.relation.isAvailableAtCollege of Computer, Mathematical & Physical Sciencesen_us
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_us
dc.relation.isAvailableAtBiologyen_us
dc.relation.isAvailableAtUniversity of Maryland (College Park, MD)en_us
dc.subjectGenome Assemblyen_US
dc.subjectValidation Toolen_US
dc.subjectGood Assemblyen_US
dc.subjectValidation Metricsen_US
dc.subjectMultiple Assemblyen_US
dc.titleAutomated ensemble assembly and validation of microbial genomesen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1471-2105-15-126.pdf
Size:
1.17 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.57 KB
Format:
Item-specific license agreed upon to submission
Description: