Serendipitous discovery of Wolbachia genomes in multiple Drosophila species

View/ Open
Date
2005Author
Salzberg, Steven L.
Dunning Hotopp, Julie C.
Delcher, Arthur L.
Pop, Mihai
Smith, Douglas R
Eisen, Michael B.
Nelson, William C.
Citation
Serendipitous discovery of Wolbachia genomes in multiple Drosophila species. (local PDF copy) S.L. Salzberg, J.C. Dunning Hotopp, A.L. Delcher, M. Pop, D.R. Smith, M.B. Eisen, and W.C. Nelson. Genome Biology 2005, 6:R23.
Metadata
Show full item recordAbstract
Background: The Trace Archive is a repository for the raw, unanalyzed data generated by largescale
genome sequencing projects. The existence of this data offers scientists the possibility of
discovering additional genomic sequences beyond those originally sequenced. In particular, if the
source DNA for a sequencing project came from a species that was colonized by another organism,
then the project may yield substantial amounts of genomic DNA, including near-complete genomes,
from the symbiotic or parasitic organism.
Results: By searching the publicly available repository of DNA sequencing trace data, we
discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different
species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences
with partial matches to a previously sequenced Wolbachia strain and assembled those sequences
using customized software. For one of the three new species, the data recovered were sufficient
to produce an assembly that covers more than 95% of the genome; for a second species the data
produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80%
of the genome; and for the third species the data cover approximately 6-7% of the genome.
Conclusions: The results of this study reveal an unexpected benefit of depositing raw data in a
central genome sequence repository: new species can be discovered within this data. The
differences between these three new Wolbachia genomes and the previously sequenced strain
revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes.
The three new genomes, with annotation, have been deposited in GenBank.