Full-length messenger RNA sequences greatly improve genome annotation
Full-length messenger RNA sequences greatly improve genome annotation
Loading...
Files
Publication or External Link
Date
2002-05-30
Authors
Haas, Brian J
Volfovsky, Natalia
Town, Christopher D
Troukhan, Maxim
Alexandrov, Nickolai
Feldman, Kenneth A
Flavell, Richard B
White, Owen
Salzberg, Steven L.
Advisor
Citation
Full-length messenger RNA sequences greatly improve genome annotation. B.J. Haas, N. Volfovsky, C.D. Town, M. Troukhan, N. Alexandrov, K.A. Feldmann, R.B. Flavell, O. White, and S.L. Salzberg. Genome Biology 3:6 (2002), research0029.1-12.
DRUM DOI
Abstract
Background: Annotation of eukaryotic genomes is a complex endeavor that requires the
integration of evidence from multiple, often contradictory, sources. With the ever-increasing
amount of genome sequence data now available, methods for accurate identification of large
numbers of genes have become urgently needed. In an effort to create a set of very high-quality
gene models, we used the sequence of 5,000 full-length gene transcripts from Arabidopsis to
re-annotate its genome. We have mapped these transcripts to their exact chromosomal locations
and, using alignment programs, have created gene models that provide a reference set for this
organism.
Results: Approximately 35% of the transcripts indicated that previously annotated genes needed
modification, and 5% of the transcripts represented newly discovered genes. We also discovered
that multiple transcription initiation sites appear to be much more common than previously
known, and we report numerous cases of alternative mRNA splicing. We include a comparison of
different alignment software and an analysis of how the transcript data improved the previously
published annotation.
Conclusions: Our results demonstrate that sequencing of large numbers of full-length
transcripts followed by computational mapping greatly improves identification of the complete
exon structures of eukaryotic genes. In addition, we are able to find numerous introns in the
untranslated regions of the genes.