Full-length messenger RNA sequences greatly improve genome annotation

dc.contributor.authorHaas, Brian J
dc.contributor.authorVolfovsky, Natalia
dc.contributor.authorTown, Christopher D
dc.contributor.authorTroukhan, Maxim
dc.contributor.authorAlexandrov, Nickolai
dc.contributor.authorFeldman, Kenneth A
dc.contributor.authorFlavell, Richard B
dc.contributor.authorWhite, Owen
dc.contributor.authorSalzberg, Steven L.
dc.date.accessioned2008-06-18T16:33:19Z
dc.date.available2008-06-18T16:33:19Z
dc.date.issued2002-05-30
dc.description.abstractBackground: Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome sequence data now available, methods for accurate identification of large numbers of genes have become urgently needed. In an effort to create a set of very high-quality gene models, we used the sequence of 5,000 full-length gene transcripts from Arabidopsis to re-annotate its genome. We have mapped these transcripts to their exact chromosomal locations and, using alignment programs, have created gene models that provide a reference set for this organism. Results: Approximately 35% of the transcripts indicated that previously annotated genes needed modification, and 5% of the transcripts represented newly discovered genes. We also discovered that multiple transcription initiation sites appear to be much more common than previously known, and we report numerous cases of alternative mRNA splicing. We include a comparison of different alignment software and an analysis of how the transcript data improved the previously published annotation. Conclusions: Our results demonstrate that sequencing of large numbers of full-length transcripts followed by computational mapping greatly improves identification of the complete exon structures of eukaryotic genes. In addition, we are able to find numerous introns in the untranslated regions of the genes.en
dc.format.extent125806 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.citationFull-length messenger RNA sequences greatly improve genome annotation. B.J. Haas, N. Volfovsky, C.D. Town, M. Troukhan, N. Alexandrov, K.A. Feldmann, R.B. Flavell, O. White, and S.L. Salzberg. Genome Biology 3:6 (2002), research0029.1-12.en
dc.identifier.urihttp://hdl.handle.net/1903/8007
dc.language.isoen_USen
dc.publisherGenome Biologyen
dc.relation.isAvailableAtCollege of Computer, Mathematical & Physical Sciencesen_us
dc.relation.isAvailableAtComputer Scienceen_us
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_us
dc.relation.isAvailableAtUniversity of Maryland (College Park, MD)en_us
dc.subjecteukaryotic genomesen
dc.subjectgenome sequenceen
dc.subjectDNAen
dc.subjectgene modelsen
dc.subjectmRNAen
dc.subjectexonsen
dc.subjectintronsen
dc.titleFull-length messenger RNA sequences greatly improve genome annotationen
dc.typeArticleen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Full-length.pdf
Size:
122.86 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.8 KB
Format:
Item-specific license agreed upon to submission
Description: