A new rhesus macaque assembly and annotation for next-generation sequencing analyses

dc.contributor.authorZimin, Aleksey V
dc.contributor.authorCornish, Adam S
dc.contributor.authorMaudhoo, Mnirnal D
dc.contributor.authorGibbs, Robert M
dc.contributor.authorZhang, Xiongfei
dc.contributor.authorPandey, Sanjit
dc.contributor.authorMeehan, Daniel T
dc.contributor.authorWipfler, Kristin
dc.contributor.authorBosinger, Steven E
dc.contributor.authorJohnson, Zachary P
dc.contributor.authorTharp, Gregory K
dc.contributor.authorMarçais, Guillaume
dc.contributor.authorRoberts, Michael
dc.contributor.authorFerguson, Betsy
dc.contributor.authorFox, Howard S
dc.contributor.authorTreangen, Todd
dc.contributor.authorSalzberg, Steven L
dc.contributor.authorYorke, James A
dc.contributor.authorNorgren, Robert B Jr
dc.date.accessioned2021-09-14T15:05:51Z
dc.date.available2021-09-14T15:05:51Z
dc.date.issued2014-10-14
dc.description.abstractThe rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses. We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.en_US
dc.description.urihttps://doi.org/10.1186/1745-6150-9-20
dc.identifierhttps://doi.org/10.13016/h1lv-ygss
dc.identifier.citationZimin, A.V., Cornish, A.S., Maudhoo, M.D. et al. A new rhesus macaque assembly and annotation for next-generation sequencing analyses. Biol Direct 9, 20 (2014).en_US
dc.identifier.urihttp://hdl.handle.net/1903/27688
dc.language.isoen_USen_US
dc.publisherSpringer Natureen_US
dc.relation.isAvailableAtCollege of Computer, Mathematical & Natural Sciencesen_us
dc.relation.isAvailableAtPhysicsen_us
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_us
dc.relation.isAvailableAtUniversity of Maryland (College Park, MD)en_us
dc.subjectMacaca mulattaen_US
dc.subjectRhesus macaqueen_US
dc.subjectGenomeen_US
dc.subjectAssemblyen_US
dc.subjectAnnotationen_US
dc.subjectTranscriptomeen_US
dc.subjectNext-generation sequencingen_US
dc.titleA new rhesus macaque assembly and annotation for next-generation sequencing analysesen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1745-6150-9-20.pdf
Size:
1.53 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.57 KB
Format:
Item-specific license agreed upon to submission
Description: