A clustering method for repeat analysis in DNA sequences.

Volfovsky, Natalia; Haas, Brian J.; Salzberg, Steven L.

A clustering method for repeat analysis in DNA sequences.

dc.contributor.author	Volfovsky, Natalia
dc.contributor.author	Haas, Brian J.
dc.contributor.author	Salzberg, Steven L.
dc.date.accessioned	2008-06-18T16:33:36Z
dc.date.available	2008-06-18T16:33:36Z
dc.date.issued	2001-08-01
dc.description.abstract	Background: A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats. Results: The resulting software tool collects all repeat classes and outputs summary statistics as well as a file containing multiple sequences (multi fasta), that can be used as the target of searches. Its use is demonstrated here on several complete microbial genomes, the entire Arabidopsis thaliana genome, and a large collection of rice bacterial artificial chromosome end sequences. Conclusions: We propose a new clustering method for analysis of the repeat data captured in suffix trees. This method has been incorporated into a system that can find repeats in individual genome sequences or sets of sequences, and that can organize those repeats into classes. It quickly and accurately creates repeat databases from small and large genomes. The associated software (RepeatFinder), should prove helpful in the analysis of repeat structure for both complete and partial genome sequences.	en
dc.format.extent	190349 bytes
dc.format.mimetype	application/pdf
dc.identifier.citation	A clustering method for repeat analysis in DNA sequences. N. Volfovsky, B.J. Haas, and S.L. Salzberg. Genome Biology 2:8 (2001), research0027:1-11.	en
dc.identifier.uri	http://hdl.handle.net/1903/8008
dc.language.iso	en_US	en
dc.publisher	Genome Biology	en
dc.relation.isAvailableAt	College of Computer, Mathematical & Physical Sciences	en_us
dc.relation.isAvailableAt	Computer Science	en_us
dc.relation.isAvailableAt	Digital Repository at the University of Maryland	en_us
dc.relation.isAvailableAt	University of Maryland (College Park, MD)	en_us
dc.subject	genomic sequences	en
dc.subject	suffix tree	en
dc.subject	multi fasta	en
dc.subject	genomes	en
dc.subject	RepeatFinder	en
dc.title	A clustering method for repeat analysis in DNA sequences.	en
dc.type	Article	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Clustering.pdf
Size:: 185.89 KB
Format:: Adobe Portable Document Format

Download

Collections

Computer Science Research Works