Improving the Performance and Precision of Bioinformatics Algorithms

Wu, Xue

Improving the Performance and Precision of Bioinformatics Algorithms

dc.contributor.advisor	Tseng, Chau-Wen	en_US
dc.contributor.author	Wu, Xue	en_US
dc.contributor.department	Computer Science	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2008-10-11T05:53:10Z
dc.date.available	2008-10-11T05:53:10Z
dc.date.issued	2008-08-25	en_US
dc.description.abstract	Recent advances in biotechnology have enabled scientists to generate and collect huge amounts of biological experimental data. Software tools for analyzing both genomic (DNA) and proteomic (protein) data with high speed and accuracy have thus become very important in modern biological research. This thesis presents several techniques for improving the performance and precision of bioinformatics algorithms used by biologists. Improvements in both the speed and cost of automated DNA sequencers have allowed scientists to sequence the DNA of an increasing number of organisms. One way biologists can take advantage of this genomic DNA data is to use it in conjunction with expressed sequence tag (EST) and cDNA sequences to find genes and their splice sites. This thesis describes ESTmapper, a tool designed to use an eager write-only top-down (WOTD) suffix tree to efficiently align DNA sequences against known genomes. Experimental results show that ESTmapper can be much faster than previous techniques for aligning and clustering DNA sequences, and produces alignments of comparable or better quality. Peptide identification by tandem mass spectrometry (MS/MS) is becoming the dominant high-throughput proteomics workflow for protein characterization in complex samples. Biologists currently rely on protein database search engines to identify peptides producing experimentally observed mass spectra. This thesis describes two approaches for improving peptide identification precision using statistical machine learning. HMMatch (HMM MS/MS Match) is a hidden Markov model approach to spectral matching, in which many examples of a peptide fragmentation spectrum are summarized in a generative probabilistic model that captures the consensus and variation of each peak's intensity. Experimental results show that HMMatch can identify many peptides missed by traditional spectral matching and search engines. PepArML (Peptide Identification Arbiter by Machine Learning) is a machine learning based framework for improving the precision of peptide identification. It uses classification algorithms to effectively utilize spectra features and scores from multiple search engines in a single model-free framework that can be trained in an unsupervised manner. Experimental results show that PepArML can improve the sensitivity of peptide identification for several synthetic protein mixtures compared with individual search engines.	en_US
dc.format.extent	1184612 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/1903/8634
dc.language.iso	en_US
dc.subject.pqcontrolled	Computer Science	en_US
dc.title	Improving the Performance and Precision of Bioinformatics Algorithms	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: umi-umd-5751.pdf
Size:: 1.13 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Computer Science Theses and Dissertations