Browsing by Author "Yorke, James A"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies(Springer Nature, 2014-03-04) Neale, David B; Wegrzyn, Jill L; Stevens, Kristian A; Zimin, Aleksey V; Puiu, Daniela; Crepeau, Marc W; Cardeno, Charis; Koriabine, Maxim; Holtz-Morris, Ann E; Liechty, John D; Martínez-García, Pedro J; Vasquez-Gross, Hans A; Lin, Brian Y; Zieve, Jacob J; Dougherty, William M; Fuentes-Soriano, Sara; Wu, Le-Shin; Gilbert, Don; Marçais, Guillaume; Roberts, Michael; Holt, Carson; Yandell, Mark; Davis, John M; Smith, Katherine E; Dean, Jeffrey FD; Lorenz, W Walter; Whetten, Ross W; Sederoff, Ronald; Wheeler, Nicholas; McGuire, Patrick E; Main, Doreen; Loopstra, Carol A; Mockaitis, Keithanne; deJong, Pieter J; Yorke, James A; Salzberg, Steven L; Langley, Charles HThe size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination. We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome. In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.Item A new rhesus macaque assembly and annotation for next-generation sequencing analyses(Springer Nature, 2014-10-14) Zimin, Aleksey V; Cornish, Adam S; Maudhoo, Mnirnal D; Gibbs, Robert M; Zhang, Xiongfei; Pandey, Sanjit; Meehan, Daniel T; Wipfler, Kristin; Bosinger, Steven E; Johnson, Zachary P; Tharp, Gregory K; Marçais, Guillaume; Roberts, Michael; Ferguson, Betsy; Fox, Howard S; Treangen, Todd; Salzberg, Steven L; Yorke, James A; Norgren, Robert B JrThe rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses. We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.Item A whole-genome assembly of the domestic cow, Bos taurus(2009-04-29) Zimin, Aleksey V; Delcher, Arthur L; Florea, Liliana; Kelley, David R; Schatz, Michael C; Puiu, Daniela; Hanrahan, Finnian; Pertea, Geo; Van Tassell, Curtis P; Sonstegard, Tad S; Marcais, Guillaume; Roberts, Michael; Subramanian, Poorani; Yorke, James A; Salzberg, Steven LBackground: The genome of the domestic cow, Bos taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods. Results: We have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion base pairs that has multiple improvements over previous assemblies: it is more complete, covering more of the genome; thousands of gaps have been closed; many erroneous inversions, deletions, and translocations have been corrected; and thousands of single-nucleotide errors have been corrected. Our evaluation using independent metrics demonstrates that the resulting assembly is substantially more accurate and complete than alternative versions. Conclusions: By using independent mapping data and conserved synteny between the cow and human genomes, we were able to construct an assembly with excellent large-scale contiguity in which a large majority (approximately 91%) of the genome has been placed onto the 30 B. taurus chromosomes. We constructed a new cow-human synteny map that expands upon previous maps. We also identified for the first time a portion of the B. taurus Y chromosome.