Improving the Performance and Precision of Bioinformatics Algorithms

dc.contributor.advisorTseng, Chau-Wenen_US
dc.contributor.authorWu, Xueen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2008-10-11T05:53:10Z
dc.date.available2008-10-11T05:53:10Z
dc.date.issued2008-08-25en_US
dc.description.abstractRecent advances in biotechnology have enabled scientists to generate and collect huge amounts of biological experimental data. Software tools for analyzing both genomic (DNA) and proteomic (protein) data with high speed and accuracy have thus become very important in modern biological research. This thesis presents several techniques for improving the performance and precision of bioinformatics algorithms used by biologists. Improvements in both the speed and cost of automated DNA sequencers have allowed scientists to sequence the DNA of an increasing number of organisms. One way biologists can take advantage of this genomic DNA data is to use it in conjunction with expressed sequence tag (EST) and cDNA sequences to find genes and their splice sites. This thesis describes ESTmapper, a tool designed to use an eager write-only top-down (WOTD) suffix tree to efficiently align DNA sequences against known genomes. Experimental results show that ESTmapper can be much faster than previous techniques for aligning and clustering DNA sequences, and produces alignments of comparable or better quality. Peptide identification by tandem mass spectrometry (MS/MS) is becoming the dominant high-throughput proteomics workflow for protein characterization in complex samples. Biologists currently rely on protein database search engines to identify peptides producing experimentally observed mass spectra. This thesis describes two approaches for improving peptide identification precision using statistical machine learning. HMMatch (HMM MS/MS Match) is a hidden Markov model approach to spectral matching, in which many examples of a peptide fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. Experimental results show that HMMatch can identify many peptides missed by traditional spectral matching and search engines. PepArML (Peptide Identification Arbiter by Machine Learning) is a machine learning based framework for improving the precision of peptide identification. It uses classification algorithms to effectively utilize spectra features and scores from multiple search engines in a single model-free framework that can be trained in an unsupervised manner. Experimental results show that PepArML can improve the sensitivity of peptide identification for several synthetic protein mixtures compared with individual search engines.en_US
dc.format.extent1184612 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/8634
dc.language.isoen_US
dc.subject.pqcontrolledComputer Scienceen_US
dc.titleImproving the Performance and Precision of Bioinformatics Algorithmsen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
umi-umd-5751.pdf
Size:
1.13 MB
Format:
Adobe Portable Document Format