Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 5 of 5
  • Thumbnail Image
    Item
    Algorithmic approaches for investigating DNA Methylation in tumor evolution and heterogeneity
    (2024) Li, Xuan; Sahinalp, S. Cenk; Mount, Stephen M.; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Intratumor heterogeneity and tumor diversity of cancer impose significant challenges on the prospect of personalized cancer diagnosis, treatment, and prognostics. While many studies seek to understand the complex dynamics of cancer with theoretically well-suited biomarkers like DNA mutations, the relative molecular rigidity and sparsity of mutation make it often challenging to reconstruct reliable tumor lineage using mutation profiles in practice. Epigenetic markers like DNA methylation, on the other hand, serve as a promising alternative to elucidate intratumor heterogeneity and tumor diversity. However, systematic research leveraging algorithmic approaches to investigate DNA methylation in the context of tumor evolution and heterogeneity remains limited. Aimed to address critical gaps in computational cancer research, this dissertation presents novel computational frameworks for analyzing DNA methylation at both single-cell and bulk levels and offers insights into methylation-based tumor heterogeneity, tumor evolutionary dynamics, and cellular composition in tumor samples for characterization of the complex epigenetic landscape of tumors. Chapter 2 and Chapter 3 introduce Sgootr (Single-cell Genomic methylatiOn tumOr Tree Reconstruction), the first distance-based computational method to jointly select tumor lineage-informative CpG sites and reconstruct tumor lineages from single-cell methylation data. Sgootr lays the groundwork for understanding tumor evolution through the lens of single-cell methylation profiles. Motivated by the need highlighted in Chapter 2 to overcome imbalances in single-cell methylation data across patient samples for interpretable comparative patient analysis, Chapter 4 presents FALAFL (FAir muLti-sAmple Feature seLection). With integer linear programming (ILP) serving as its algorithmic backbone, FALAFL provides a fast and reliable solution to fairly select CpG sites across different single-cell methylation patient samples to optimally represent the entire patient cohort and identify reliable tumor lineage-informative CpG sites. Finally, Chapter 5 shifts the scope from single-cell to bulk tissue contexts and introduces Qombucha (Quadratic prOgraMming Based tUmor deConvolution with cell HierArchy), which is designed to tackle the challenges of bulk tissue analysis by inferring the methylation profiles of progenitor brain cells and determining cell type composition in bulk glioblastoma (GBM) samples. The work presented in this dissertation demonstrates the power of algorithmic and data science approaches to tackle some of the most pressing challenges in understanding the complexity of cancer epigenomics. With novel computational tools addressing current limitations in methylation data analysis, this work paves the way for further research in tumor evolution, personalized cancer treatment, and biomarker discovery. Overall, the computational frameworks and findings presented here bridge the gap between complex molecular data and clinically meaningful insights in the battle against cancer.
  • Thumbnail Image
    Item
    EXAMINING HIBERNATION IN THE BIG BROWN BAT THROUGH DNA METHYLATION
    (2021) Sullivan, Isabel; Wilkinson, Gerald S; Marine-Estuarine-Environmental Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Hibernation allows individuals to conserve energy during seasonal low temperatures. As the physiological regulation of hibernation is inadequately understood, I examine hibernation using DNA methylation (DNAm). DNAm is the addition of a methyl group to cytosine at cytosine guanine dinucleotide (CpG) sites in the genome. DNAm in promoters can repress gene expression and be influenced by histone modifications. Using the big brown bat, Eptesicus fuscus, I examined how hibernation influences DNAm, independent of age, through comparing DNAm from bats that differed in hibernation history and comparing DNAm from the same individual between hibernating and active seasons. Both comparisons found evidence of differential enrichment of genes near significant CpG sites resulting from hibernation. The latter analysis found evidence consistent with a histone mark, associated with active transcription, is likely enriched in hibernating bats. These results suggest that DNAm and histone modifications associated with transcription factor binding regulate gene expression during hibernation.
  • Thumbnail Image
    Item
    ESTIMATION AND ANALYSIS OF CELL-SPECIFIC DNA METHYLATION FROM BISULFITE-SEQUENCING DATA
    (2018) Dorri, Faezeh; Bravo Corrada, Hector; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    DNA methylation is the best understood heritable gene regulatory mechanism that does not involve direct modification of DNA sequence itself. Cells with different methylation profiles (over temporal or micro-environmental dimensions) may exhibit different phenotypic properties. In cancer, heterogeneity across cells in the tumor microenvironment presents significant challenges to treatment. In particular, epigenetic heterogeneity is discernible among tumor cells, and it is believed to impact the growth properties and treatment resistance of tumors. Existing computational methods used to study the epigenetic composition of cell populations are based on the analysis of DNA methylation modifications at multiple consecutive genomic loci spanned by single DNA sequencing reads. These approaches have yielded great insight into how cell populations differ epigenetically across different tissues. However, they only provide a general summary of the epigenetic composition of these cell populations without providing cell-specific methylation patterns over longer genomic spans to perform a comprehensive analysis of the epigenetic heterogeneity of cell populations. In this dissertation, we address this challenge by proposing two computational methods called methylFlow and MCFDiff. In methylFlow, we propose a novel method based on network flow algorithms to reconstruct cell-specific methylation profiles using reads obtained from sequencing bisulfite-converted DNA.We reveal the methylation profile of underlying clones in a heterogeneous cell population including the methylation patterns and their corresponding abundance within the population. In MCFDiff, we propose a statistical model that leverages the identified cell-specific methylation profiles (from methylFlow) to determine regions of differential methylation composition (RDMCs) between multiple phenotypic groups, in particular, between tumor and paired normal tissue. In MCFDiff, we can systematically exclude the tumor tissue impurities and increase the accuracy in detecting the regions with differential methylation composition in normal and tumor samples. Profiling the changes between normal and tumor samples according to the reconstructed methylation profile of underling clone in different samples leads us to the discovery of de novo epigenetic markers and a better understanding about the effect of epigenetic heterogeneity in cancer dynamics from the initiation, progression to metastasis, and relapse.
  • Thumbnail Image
    Item
    MAPPING AND CHARACTERIZATION OF FUNCTIONAL INNOVATIONS IN CIS-ACTING ELEMENTS AND TRANS-ACTING FACTORS
    (2017) Sarda, Shrutii; Hannenhalli, Sridhar; Biology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The primary mediators of transcriptional regulation are the cis-regulatory elements (CREs), viz., promoters and enhancers, and the trans-acting factors (TFs) that bind to the CREs. First, the landscape of distinct sequence elements that regulate the spatio-temporal activity profiles of genes is far from complete. For example, several (or alternate) CREs can in a context-specific fashion regulate transcription of one gene. Second, mutations that occur in the coding sequences of TFs, or those occurring in CREs that determine TF binding sites, may change the identity of the cognate TF or alter the affinity with which a site is bound, respectively. This in turn introduces a change in the logic of the transcriptional regulatory circuits harboring these modifications and leads to adaptations in the form of novel gene expression patterns, or robust responses to internal or external signals. CREs and trans-acting factors thus provide an extensive platform for regulatory innovation; the extent of which is only beginning to be appreciated. In this thesis, we discuss three yet-unexplored avenues of regulatory innovation and provide novel insights into each program. Cis-regulatory rewiring mediated by CREs: A co-regulated module of genes (“regulon”) can have evolutionarily conserved expression and yet have diverged upstream regulators across species, such as the ribosomal regulon which is regulated by the transcription factor (TF) TBF1 in C. albicans, instead of RAP1 in S. cerevisiae. Only a handful of such rewiring events have been established, and the prevalence or conditions conducive to such events are not well known. Here, we develop a novel probabilistic scoring method to comprehensively screen for rewiring within regulons across 23 yeast species. Our analysis recapitulates known events, and suggests TF candidates for certain processes reported to be under distinct regulatory controls in S. cerevisiae and C. albicans, for which the implied regulators are not known. Independent functional analyses of rewiring TF pairs revealed greater functional interactions, common upstream regulators and shared biological processes between them. Our study reveals that cis-rewiring is pervasive; and generated a high-confidence resource of specific events. Interaction-mediated regulatory rewiring in TFs: Similar to evolutionary changes in the sequence of CREs, changes within coding regions of TFs can allow for altered protein-protein interaction capabilities and function, through motif and domain turnover across evolution. For example, FTZ, has switched from a homeotic TF in ancestral insect species, to being involved in segmentation in the Drosophila genus by the loss of a YPWM motif, and the gain of a LXXLL motif. Elucidating the occurrence of, and mechanisms underlying these switches in TF function is critical to our understanding of evolution. To this end, we developed an approach to detect protein interaction regulatory rewiring across 1200 TFs in 12 related arthropod species. Simulation studies show that the accuracy of event detection is approximately ~80-85%. We recapitulate the known FTZ rewiring event; and find several members of “enhancer of split” complex represented amongst top events, consistent with previous knowledge that the latter has undergone lineage specific losses and duplications across arthropod evolution. Overall, this work establishes that interaction-rewiring is quite prevalent in arthropod development, and provides a high-confidence list of such candidates. Orphan CGI alternative promoter potential: CGIs are regions with a relatively high frequency of CpG sites. CGIs that occur within gene promoters are historically well studied. Yet, about 50% of all CGIs lie outside of promoter regions (called orphan CGIs), and not much is understood about their biological significance. We show through extensive analysis of the methylome and transcriptome in 34 tissues, that in many cases of highly expressed genes with methylated-promoters, transcription is initiated by a distal orphan CGI located several tens of kb away that functions as an alternative promoter. We found strong evidence of transcription initiation at the upstream CGI and a lack thereof at the methylated proximal promoter itself. CGI-initiated transcripts are associated with signals of stable elongation and splicing that extend into the gene body, as evidenced by tissue-specific RNA-seq and other DNA-encoded splice signals. Overall, our study describes an unreported mechanism of transcription of methylated proximal promoter genes in a tissue-specific fashion.
  • Thumbnail Image
    Item
    Epiviz: Integrative Visual Analysis Software for Genomics
    (2015) Chelaru, Florin; Corrada Bravo, Hector; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. Commonly, the data visualized in these tools is the output of analyses performed in powerful computing environments like R/Bioconductor or Python. Two essential aspects of data analysis are usually treated as distinct, in spite of being part of the same exploratory process: algorithmic analysis and interactive visualization. In current technologies these are not integrated within one tool, but rather, one precedes the other. Recent technological advances in web-based data visualization have made it possible for interactive visualization tools to tightly integrate with powerful algorithmic tools, without being restricted to one such tool in particular. We introduce Epiviz (http://epiviz.cbcb.umd.edu), an integrative visualization tool that bridges the gap between the two types of tools, simplifying genomic data analysis workflows. Epiviz is the first genomics interactive visualization tool to provide tight-knit integration with computational and statistical modeling and data analysis. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a visualization tool for genomics. Epiviz can be used in multiple branches of genomics data analysis for various types of datasets, of which we detail two: functional genomics data, aligned to a continuous coordinate such as the genome, and metagenomics, organized according to volatile hierarchical coordinate spaces. We also present security implications of the current design, performance benchmarks, a series of limitations and future research steps.