ESTmapper: Efficiently Clustering EST Sequences Using Genome Maps

View/ Open
Date
2004-04-19Author
Wu, Xue
Lee, Woei-Jyh (Adam)
Gupta, Damayanti
Tseng, Chau-Wen
Metadata
Show full item recordAbstract
Expressed sequence tags (ESTs) are short transcribed
nucleotide sequences that can be used to discover new genes
and measuring gene expression. Because individual ESTs are short
and error-prone, ESTs must first be clustered to be useful.
In this paper, we describe ESTmapper, a new tool for clustering
EST sequences based on efficiently mapping ESTs to the genome.
Our mapping algorithm is based on first building an eager
write-only top-down (WOTD) suffix tree
for the genome, then searching for long common substrings
between each EST and the genome to build matching regions,
gapped local alignments between the EST and genome that account
for sequencing errors and splicing. Long matching regions
are then used to map ESTs to the genome and place ESTs into
clusters based on location. Preliminary experimental evaluation shows that though ESTmapper requires a large amount of initial
memory to store the genome suffix tree, it is quite precise and
more efficient than previous techniques such as TGICL and PaCE
when clustering large numbers of ESTs.
(UMIACS-TR-2004-20)