Biology
Permanent URI for this communityhttp://hdl.handle.net/1903/11810
Browse
3 results
Search Results
Item Exploiting sparseness in de novo genome assembly(Springer Nature, 2012-04-19) Ye, Chengxi; Sam Ma, Zhanshan; Cannon, Charles H; Pop, Mihai; Yu, Douglas WThe very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments. In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k- mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer. We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k- mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.Item Mapping of pigmentation QTL on an anchored genome assembly of the cichlid fish, Metriaclima zebra(Springer Nature, 2013-04-27) O’Quin, Claire T; Drilea, Alexi C; Conte, Matthew A; Kocher, Thomas DPigmentation patterns are one of the most recognizable phenotypes across the animal kingdom. They play an important role in camouflage, communication, mate recognition and mate choice. Most progress on understanding the genetics of pigmentation has been achieved via mutational analysis, with relatively little work done to understand variation in natural populations. Pigment patterns vary dramatically among species of cichlid fish from Lake Malawi, and are thought to be important in speciation. In this study, we crossed two species, Metriaclima zebra and M. mbenjii, that differ in several aspects of their body and fin color. We genotyped 798 SNPs in 160 F2 male individuals to construct a linkage map that was used to identify quantitative trait loci (QTL) associated with the pigmentation traits of interest. We also used the linkage map to anchor portions of the M. zebra genome assembly. We constructed a linkage map consisting of 834 markers in 22 linkage groups that spanned over 1,933 cM. QTL analysis detected one QTL each for dorsal fin xanthophores, caudal fin xanthophores, and pelvic fin melanophores. Dorsal fin and caudal fin xanthophores share a QTL on LG12, while pelvic fin melanophores have a QTL on LG11. We used the mapped markers to anchor 66.5% of the M. zebra genome assembly. Within each QTL interval we identified several candidate genes that might play a role in pigment cell development. This is one of a few studies to identify QTL for natural variation in fish pigmentation. The QTL intervals we identified did not contain any pigmentation genes previously identified by mutagenesis studies in other species. We expect that further work on these intervals will identify new genes involved in pigment cell development in natural populations.Item Automated ensemble assembly and validation of microbial genomes(Springer Nature, 2014-05-03) Koren, Sergey; Treangen, Todd J; Hill, Christopher M; Pop, Mihai; Phillippy, Adam MThe continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.