ESTmapper: Efficiently Clustering EST Sequences Using Genome Maps
dc.contributor.author | Wu, Xue | en_US |
dc.contributor.author | Lee, Woei-Jyh (Adam) | en_US |
dc.contributor.author | Gupta, Damayanti | en_US |
dc.contributor.author | Tseng, Chau-Wen | en_US |
dc.date.accessioned | 2004-05-31T23:37:00Z | |
dc.date.available | 2004-05-31T23:37:00Z | |
dc.date.created | 2004-03 | en_US |
dc.date.issued | 2004-04-19 | en_US |
dc.description.abstract | Expressed sequence tags (ESTs) are short transcribed nucleotide sequences that can be used to discover new genes and measuring gene expression. Because individual ESTs are short and error-prone, ESTs must first be clustered to be useful. In this paper, we describe ESTmapper, a new tool for clustering EST sequences based on efficiently mapping ESTs to the genome. Our mapping algorithm is based on first building an eager write-only top-down (WOTD) suffix tree for the genome, then searching for long common substrings between each EST and the genome to build matching regions, gapped local alignments between the EST and genome that account for sequencing errors and splicing. Long matching regions are then used to map ESTs to the genome and place ESTs into clusters based on location. Preliminary experimental evaluation shows that though ESTmapper requires a large amount of initial memory to store the genome suffix tree, it is quite precise and more efficient than previous techniques such as TGICL and PaCE when clustering large numbers of ESTs. (UMIACS-TR-2004-20) | en_US |
dc.format.extent | 820944 bytes | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | http://hdl.handle.net/1903/1348 | |
dc.language.iso | en_US | |
dc.relation.isAvailableAt | Digital Repository at the University of Maryland | en_US |
dc.relation.isAvailableAt | University of Maryland (College Park, Md.) | en_US |
dc.relation.isAvailableAt | Tech Reports in Computer Science and Engineering | en_US |
dc.relation.isAvailableAt | UMIACS Technical Reports | en_US |
dc.relation.ispartofseries | UM Computer Science Department; CS-TR-4575 | en_US |
dc.relation.ispartofseries | UMIACS; UMIACS-TR-2004-20 | en_US |
dc.title | ESTmapper: Efficiently Clustering EST Sequences Using Genome Maps | en_US |
dc.type | Technical Report | en_US |
Files
Original bundle
1 - 1 of 1