ESTmapper: Efficiently Clustering EST Sequences Using Genome Maps

dc.contributor.authorWu, Xueen_US
dc.contributor.authorLee, Woei-Jyh (Adam)en_US
dc.contributor.authorGupta, Damayantien_US
dc.contributor.authorTseng, Chau-Wenen_US
dc.date.accessioned2004-05-31T23:37:00Z
dc.date.available2004-05-31T23:37:00Z
dc.date.created2004-03en_US
dc.date.issued2004-04-19en_US
dc.description.abstractExpressed sequence tags (ESTs) are short transcribed nucleotide sequences that can be used to discover new genes and measuring gene expression. Because individual ESTs are short and error-prone, ESTs must first be clustered to be useful. In this paper, we describe ESTmapper, a new tool for clustering EST sequences based on efficiently mapping ESTs to the genome. Our mapping algorithm is based on first building an eager write-only top-down (WOTD) suffix tree for the genome, then searching for long common substrings between each EST and the genome to build matching regions, gapped local alignments between the EST and genome that account for sequencing errors and splicing. Long matching regions are then used to map ESTs to the genome and place ESTs into clusters based on location. Preliminary experimental evaluation shows that though ESTmapper requires a large amount of initial memory to store the genome suffix tree, it is quite precise and more efficient than previous techniques such as TGICL and PaCE when clustering large numbers of ESTs. (UMIACS-TR-2004-20)en_US
dc.format.extent820944 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/1348
dc.language.isoen_US
dc.relation.isAvailableAtDigital Repository at the University of Marylanden_US
dc.relation.isAvailableAtUniversity of Maryland (College Park, Md.)en_US
dc.relation.isAvailableAtTech Reports in Computer Science and Engineeringen_US
dc.relation.isAvailableAtUMIACS Technical Reportsen_US
dc.relation.ispartofseriesUM Computer Science Department; CS-TR-4575en_US
dc.relation.ispartofseriesUMIACS; UMIACS-TR-2004-20en_US
dc.titleESTmapper: Efficiently Clustering EST Sequences Using Genome Mapsen_US
dc.typeTechnical Reporten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CS-TR-4575.pdf
Size:
801.7 KB
Format:
Adobe Portable Document Format