Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    ESTIMATION AND ANALYSIS OF CELL-SPECIFIC DNA METHYLATION FROM BISULFITE-SEQUENCING DATA
    (2018) Dorri, Faezeh; Bravo Corrada, Hector; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    DNA methylation is the best understood heritable gene regulatory mechanism that does not involve direct modification of DNA sequence itself. Cells with different methylation profiles (over temporal or micro-environmental dimensions) may exhibit different phenotypic properties. In cancer, heterogeneity across cells in the tumor microenvironment presents significant challenges to treatment. In particular, epigenetic heterogeneity is discernible among tumor cells, and it is believed to impact the growth properties and treatment resistance of tumors. Existing computational methods used to study the epigenetic composition of cell populations are based on the analysis of DNA methylation modifications at multiple consecutive genomic loci spanned by single DNA sequencing reads. These approaches have yielded great insight into how cell populations differ epigenetically across different tissues. However, they only provide a general summary of the epigenetic composition of these cell populations without providing cell-specific methylation patterns over longer genomic spans to perform a comprehensive analysis of the epigenetic heterogeneity of cell populations. In this dissertation, we address this challenge by proposing two computational methods called methylFlow and MCFDiff. In methylFlow, we propose a novel method based on network flow algorithms to reconstruct cell-specific methylation profiles using reads obtained from sequencing bisulfite-converted DNA.We reveal the methylation profile of underlying clones in a heterogeneous cell population including the methylation patterns and their corresponding abundance within the population. In MCFDiff, we propose a statistical model that leverages the identified cell-specific methylation profiles (from methylFlow) to determine regions of differential methylation composition (RDMCs) between multiple phenotypic groups, in particular, between tumor and paired normal tissue. In MCFDiff, we can systematically exclude the tumor tissue impurities and increase the accuracy in detecting the regions with differential methylation composition in normal and tumor samples. Profiling the changes between normal and tumor samples according to the reconstructed methylation profile of underling clone in different samples leads us to the discovery of de novo epigenetic markers and a better understanding about the effect of epigenetic heterogeneity in cancer dynamics from the initiation, progression to metastasis, and relapse.
  • Thumbnail Image
    Item
    Epiviz: Integrative Visual Analysis Software for Genomics
    (2015) Chelaru, Florin; Corrada Bravo, Hector; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. Commonly, the data visualized in these tools is the output of analyses performed in powerful computing environments like R/Bioconductor or Python. Two essential aspects of data analysis are usually treated as distinct, in spite of being part of the same exploratory process: algorithmic analysis and interactive visualization. In current technologies these are not integrated within one tool, but rather, one precedes the other. Recent technological advances in web-based data visualization have made it possible for interactive visualization tools to tightly integrate with powerful algorithmic tools, without being restricted to one such tool in particular. We introduce Epiviz (http://epiviz.cbcb.umd.edu), an integrative visualization tool that bridges the gap between the two types of tools, simplifying genomic data analysis workflows. Epiviz is the first genomics interactive visualization tool to provide tight-knit integration with computational and statistical modeling and data analysis. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a visualization tool for genomics. Epiviz can be used in multiple branches of genomics data analysis for various types of datasets, of which we detail two: functional genomics data, aligned to a continuous coordinate such as the genome, and metagenomics, organized according to volatile hierarchical coordinate spaces. We also present security implications of the current design, performance benchmarks, a series of limitations and future research steps.