A clustering method for repeat analysis in DNA sequences.

dc.contributor.authorVolfovsky, Natalia
dc.contributor.authorHaas, Brian J.
dc.contributor.authorSalzberg, Steven L.
dc.date.accessioned2008-06-18T16:33:36Z
dc.date.available2008-06-18T16:33:36Z
dc.date.issued2001-08-01
dc.description.abstractBackground: A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats. Results: The resulting software tool collects all repeat classes and outputs summary statistics as well as a file containing multiple sequences (multi fasta), that can be used as the target of searches. Its use is demonstrated here on several complete microbial genomes, the entire Arabidopsis thaliana genome, and a large collection of rice bacterial artificial chromosome end sequences. Conclusions: We propose a new clustering method for analysis of the repeat data captured in suffix trees. This method has been incorporated into a system that can find repeats in individual genome sequences or sets of sequences, and that can organize those repeats into classes. It quickly and accurately creates repeat databases from small and large genomes. The associated software (RepeatFinder), should prove helpful in the analysis of repeat structure for both complete and partial genome sequences.en
dc.format.extent190349 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.citationA clustering method for repeat analysis in DNA sequences. N. Volfovsky, B.J. Haas, and S.L. Salzberg. Genome Biology 2:8 (2001), research0027:1-11.en
dc.identifier.urihttp://hdl.handle.net/1903/8008
dc.language.isoen_USen
dc.publisherGenome Biologyen
dc.relation.isAvailableAtCollege of Computer, Mathematical & Physical Sciencesen_us
dc.relation.isAvailableAtComputer Scienceen_us
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_us
dc.relation.isAvailableAtUniversity of Maryland (College Park, MD)en_us
dc.subjectgenomic sequencesen
dc.subjectsuffix treeen
dc.subjectmulti fastaen
dc.subjectgenomesen
dc.subjectRepeatFinderen
dc.titleA clustering method for repeat analysis in DNA sequences.en
dc.typeArticleen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Clustering.pdf
Size:
185.89 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.8 KB
Format:
Item-specific license agreed upon to submission
Description: