Model-Based Genomic/Proteomic Signal Processing in Cancer Diagnosis and Prediction

Thumbnail Image


umi-umd-4644.pdf (8.11 MB)
No. of downloads: 903

Publication or External Link






In recent years, high throughput measurement technologies (gene microarray, protein mass spectrum) have made it possible to simultaneously monitor the expression of thousands of genes or proteins. A topic of great interest is to study the difference of gene/protein expressions between normal and cancer subjects. In the literature, various data-driven methods have been proposed, i.e. clustering and machine learning methods. In this thesis, an alternative model-driven approach is proposed. The proposed dependence model focuses on the interactions among genes or proteins. We have shown that the dependence model is highly effective in the classification of normal and cancer data. Moreover, different from data-driven methods, the dependence model carries specific biological meanings, and it has the potential for the early prediction of cancer. The concept of dependence network is proposed based on the dependence model. The interactions and co-regulation relationships among genes or proteins are modeled by the dependence network, from which we are able to reliably identify biomarkers, important genes or proteins for cancer prediction and drug development.

The analysis extends to cell cycle time-series, where one subject is measured at multiple time points during the cell cycle. Understanding the cell cycle will greatly improve our understanding of the mechanism of cancer development. In the cell cycle time-series, measurements are based on a population of cells which are supposed to be synchronized. However, continuous synchronization loss is observed due to the diversity of individual cell growth rates. Therefore, the time-series measurement is a distorted version of the single-cell expression. In this thesis, we propose a polynomial-model-based resynchronization scheme, which successfully removes the distortion. The time-series data is further analyzed to identify gene regulatory relationships. For the identification of regulatory relationships, existing literatures mainly study the relationship between several regulators and one regulated gene. In this thesis, we use the eigenvalue pattern of the dependence model to characterize several regulated genes, and propose a novel method that examines the relationship between several regulator and several regulated genes simultaneously.