ESTmapper: Efficiently Clustering EST Sequences Using Genome Maps

Loading...
Thumbnail Image

Files

CS-TR-4575.pdf (801.7 KB)
No. of downloads: 951

Publication or External Link

Date

2004-04-19

Advisor

Citation

DRUM DOI

Abstract

Expressed sequence tags (ESTs) are short transcribed nucleotide sequences that can be used to discover new genes and measuring gene expression. Because individual ESTs are short and error-prone, ESTs must first be clustered to be useful. In this paper, we describe ESTmapper, a new tool for clustering EST sequences based on efficiently mapping ESTs to the genome. Our mapping algorithm is based on first building an eager
write-only top-down (WOTD) suffix tree for the genome, then searching for long common substrings between each EST and the genome to build matching regions, gapped local alignments between the EST and genome that account for sequencing errors and splicing. Long matching regions are then used to map ESTs to the genome and place ESTs into clusters based on location. Preliminary experimental evaluation shows that though ESTmapper requires a large amount of initial memory to store the genome suffix tree, it is quite precise and more efficient than previous techniques such as TGICL and PaCE when clustering large numbers of ESTs. (UMIACS-TR-2004-20)

Notes

Rights