Algorithmic approaches for investigating DNA Methylation in tumor evolution and heterogeneity

Thumbnail Image

Files

Li_umd_0117E_24773.pdf (20.01 MB)
(RESTRICTED ACCESS)
No. of downloads:

Publication or External Link

Date

2024

Authors

Citation

Abstract

Intratumor heterogeneity and tumor diversity of cancer impose significant challenges on the prospect of personalized cancer diagnosis, treatment, and prognostics. While many studies seek to understand the complex dynamics of cancer with theoretically well-suited biomarkers like DNA mutations, the relative molecular rigidity and sparsity of mutation make it often challenging to reconstruct reliable tumor lineage using mutation profiles in practice. Epigenetic markers like DNA methylation, on the other hand, serve as a promising alternative to elucidate intratumor heterogeneity and tumor diversity. However, systematic research leveraging algorithmic approaches to investigate DNA methylation in the context of tumor evolution and heterogeneity remains limited. Aimed to address critical gaps in computational cancer research, this dissertation presents novel computational frameworks for analyzing DNA methylation at both single-cell and bulk levels and offers insights into methylation-based tumor heterogeneity, tumor evolutionary dynamics, and cellular composition in tumor samples for characterization of the complex epigenetic landscape of tumors.

Chapter 2 and Chapter 3 introduce Sgootr (Single-cell Genomic methylatiOn tumOr Tree Reconstruction), the first distance-based computational method to jointly select tumor lineage-informative CpG sites and reconstruct tumor lineages from single-cell methylation data. Sgootr lays the groundwork for understanding tumor evolution through the lens of single-cell methylation profiles. Motivated by the need highlighted in Chapter 2 to overcome imbalances in single-cell methylation data across patient samples for interpretable comparative patient analysis, Chapter 4 presents FALAFL (FAir muLti-sAmple Feature seLection). With integer linear programming (ILP) serving as its algorithmic backbone, FALAFL provides a fast and reliable solution to fairly select CpG sites across different single-cell methylation patient samples to optimally represent the entire patient cohort and identify reliable tumor lineage-informative CpG sites. Finally, Chapter 5 shifts the scope from single-cell to bulk tissue contexts and introduces Qombucha (Quadratic prOgraMming Based tUmor deConvolution with cell HierArchy), which is designed to tackle the challenges of bulk tissue analysis by inferring the methylation profiles of progenitor brain cells and determining cell type composition in bulk glioblastoma (GBM) samples.

The work presented in this dissertation demonstrates the power of algorithmic and data science approaches to tackle some of the most pressing challenges in understanding the complexity of cancer epigenomics. With novel computational tools addressing current limitations in methylation data analysis, this work paves the way for further research in tumor evolution, personalized cancer treatment, and biomarker discovery. Overall, the computational frameworks and findings presented here bridge the gap between complex molecular data and clinically meaningful insights in the battle against cancer.

Notes

Rights