Clustering Algorithms for Microarray Data Mining

dc.contributor.advisorBaras, John S.en_US
dc.contributor.authorBhamidipati, Phanikumaren_US
dc.contributor.departmentISRen_US
dc.date.accessioned2007-05-23T10:12:32Z
dc.date.available2007-05-23T10:12:32Z
dc.date.issued2002en_US
dc.description.abstractThis thesis presents a systems engineering model of modern drug discovery processes and related systems integration requirements. Some challenging problems include the integration of public information content with proprietary corporate content, supporting different types of scientific analyses, and automated analysis tools motivated by diverse forms of biological data.<p>To capture the requirements of the discovery system, we identify the processes, users, and scenarios to form a UML use case model. We then define the object-oriented system structure and attach behavioral elements. We also look at how object-relational database extensions can be applied for such analysis.<p>The next portion of the thesis studies the performance of clustering algorithms based on LVQ, SVMs, and other machine learning algorithms, to two types of analyses - functional and phenotypic classification. We found that LVQ initialized with the LBG codebook yields comparable performance to the optimal separating surfaces generated by related SVM kernels. <p>We also describe a novel similarity measure, called the unnormalized symmetric Kullback-Liebler measure, based on unnormalized expression values. Since the Mercer criterion cannot be applied to this measure, we compared the performance of this similarity measure with the log-Euclidean distance in the LVQ algorithm.<p>The two distance measures perform similarly on cDNA arrays, while the unnormalized symmetric Kullback-Liebler measure outperforms the log-Euclidean distance on certain phenotypic classification problems. Pre-filtering algorithms to find discriminating instances based on PCA, the Find Similar function, and IB3 were also investigated. The Find Similar method gives the best performance in terms of multiple criteria.en_US
dc.format.extent861858 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/6295
dc.language.isoen_USen_US
dc.relation.ispartofseriesISR; MS 2002-4en_US
dc.subjectCross-Disciplinary Systems Educationen_US
dc.titleClustering Algorithms for Microarray Data Miningen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MS_2002-4.pdf
Size:
841.66 KB
Format:
Adobe Portable Document Format