DISSECTING TUMOR CLONALITY IN LIVER CANCER: A PHYLOGENY ANALYSIS USING COMPUTATIONAL AND STATISTICAL TOOLS
Publication or External Link
Liver cancer is a heterogeneous disease characterized by extensive genetic and clonaldiversity. Understanding the clonal evolution of liver tumors is crucial for developing effective treatment strategies. This dissertation aims to dissect the tumor clonality in liver cancer using computational and statistical tools, with a focus on phylogenetic analysis. Through advancements in defining and assessing phylogenetic clusters, we gain a deeper understanding of the survival disparities and clonal evolution within liver tumors, which can inform the development of tailored treatment strategies and improve patient outcomes. The thesis begins by providing an overview of sources of heterogeneity in liver cancer and data types, from Whole-Exome (WEX) and RNA sequencing (RNA-seq) read-counts by gene to derived quantities such as Copy Number Alterations (CNAs) and Single Nucleotide Variants (SNVs). Various tools for deriving copy-numbers are discussed and compared. Additionally, comparison of survival distributions is discussed. The central data analyses of the thesis concern the derivation of distinct clones and clustered phylogeny types from the basic genomic data in three independent cancer cohorts, TCGA-LIHC, TIGER-LC and NCI-MONGOLIA. The SMASH (Subclone multiplicity allocation and somatic heterogeneity) algorithm is introduced for clonality analysis, followed by a discussion on clustering analysis of nonlinear tumor evolution trees and the construction of phylogenetic trees for liver cancer cohorts. Identification of drivers of tumor evolution, and the immune cell micro-environment of tumors are also explored. In this research, we employ survival analysis tools to investigate and document survival differences between groups of subjects defined from phylogenetic clusters. Specifically, we introduce the log-rank test and its modifications for generic right-censored survival data, which we then apply to survival follow-up data for the subjects in the studied cohorts, clustered based on their genomic data. The final chapter of this thesis takes a significant step forward by extending an existing methodology for covariate-adjustment in the two-sample log-rank test to a K-sample scenario, with a specific focus on the already defined phylogeny cluster groups. This extension is not straightforward because the computation of the test statistic for K-sample and its asymptotic null distribution do not follow directly from the two-sample case. Using these extended tools, we conduct an illustrative data analysis with real data from the TIGER-LC cohort, which comprises subjects with analyzed and clustered genomic data, leading to defined phylogenetic clusters associated with two different types of liver cancer. By applying the extended methodology to this dataset, we aim to effectively assess and validate the survival curves of the defined clusters.