Computer Science Research Works
Permanent URI for this collectionhttp://hdl.handle.net/1903/1593
Browse
2 results
Search Results
Item Full-length messenger RNA sequences greatly improve genome annotation(Genome Biology, 2002-05-30) Haas, Brian J; Volfovsky, Natalia; Town, Christopher D; Troukhan, Maxim; Alexandrov, Nickolai; Feldman, Kenneth A; Flavell, Richard B; White, Owen; Salzberg, Steven L.Background: Annotation of eukaryotic genomes is a complex endeavor that requires the integration of evidence from multiple, often contradictory, sources. With the ever-increasing amount of genome sequence data now available, methods for accurate identification of large numbers of genes have become urgently needed. In an effort to create a set of very high-quality gene models, we used the sequence of 5,000 full-length gene transcripts from Arabidopsis to re-annotate its genome. We have mapped these transcripts to their exact chromosomal locations and, using alignment programs, have created gene models that provide a reference set for this organism. Results: Approximately 35% of the transcripts indicated that previously annotated genes needed modification, and 5% of the transcripts represented newly discovered genes. We also discovered that multiple transcription initiation sites appear to be much more common than previously known, and we report numerous cases of alternative mRNA splicing. We include a comparison of different alignment software and an analysis of how the transcript data improved the previously published annotation. Conclusions: Our results demonstrate that sequencing of large numbers of full-length transcripts followed by computational mapping greatly improves identification of the complete exon structures of eukaryotic genes. In addition, we are able to find numerous introns in the untranslated regions of the genes.Item Evidence for symmetric chromosomal inversions around the replication origin in bacteria(Genome Biology, 2000-12-04) Eisen, Jonathan A.; Heidelberg, John F.; White, Owen; Salzberg, Steven L.Background: Whole-genome comparisons can provide great insight into many aspects of biology. Until recently, however, comparisons were mainly possible only between distantly related species. Complete genome sequences are now becoming available from multiple sets of closely related strains or species. Results: By comparing the recently completed genome sequences of Vibrio cholerae, Streptococcus pneumoniae and Mycobacterium tuberculosis to those of closely related species - Escherichia coli, Streptococcus pyogenes and Mycobacterium leprae, respectively - we have identified an unusual and previously unobserved feature of bacterial genome structure. Scatterplots of the conserved sequences (both DNA and protein) between each pair of species produce a distinct X-shaped pattern, which we call an X-alignment. The key feature of these alignments is that they have symmetry around the replication origin and terminus; that is, the distance of a particular conserved feature (DNA or protein) from the replication origin (or terminus) is conserved between closely related pairs of species. Statistically significant X-alignments are also found within some genomes, indicating that there is symmetry about the replication origin for paralogous features as well. Conclusions: The most likely mechanism of generation of X-alignments involves large chromosomal inversions that reverse the genomic sequence symmetrically around the origin of replication. The finding of these X-alignments between many pairs of species suggests that chromosomal inversions around the origin are a common feature of bacterial genome evolution.