Computer Science Theses and Dissertations
Permanent URI for this collectionhttp://hdl.handle.net/1903/2756
Browse
4 results
Search Results
Item Predicting Cancer Prognosis and Drug Response from the Tumor Microbiome(2021) Hermida, Leandro Cruz; Ruppin, Eytan; Patro, Robert; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Tumor gene expression is predictive of patient prognosis in some cancers. However, RNA-seq and whole genome sequencing data contain not only reads from host tumor and normal tissue, but also reads from the tumor microbiome, which can be used to infer the microbial abundances in each tumor. Here, we show that tumor microbial abundances, alone or in combination with tumor gene expression data, can predict cancer prognosis and drug response to some extent – microbial abundances are significantly less predictive of prognosis than gene expression, although remarkably, similarly as predictive of drug response, but in mostly different cancer-drug combinations. Thus, it appears possible to leverage existing sequencing technology, or develop new protocols, to obtain more non-redundant information about prognosis and drug response from RNA-seq and whole genome sequencing experiments than could be obtained from gene expression or mutation data alone.Item CHARACTERIZATION OF SURVIVAL ASSOCIATED GENE INTERACTIONS AND LYMPHOCYTE HETEROGENEITY IN CANCER(2019) Magen, Assaf; Hannenhalli, Sridhar; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Cancer is the second leading cause of death globally. Tumors form intricate ecosystems in which malignant and immune cells interact to shape disease progression. Yet, the molecular underpinnings of tumorigenesis and immunological responses to tumors are poorly understood, limiting their manipulation to elicit favorable clinical outcomes. This thesis lays conceptual frameworks for investigating the molecular interactions taking place in tumors as well as the diversity of the immune response to cancer. In the molecular level of individual cancer cells, the phenotypic effect of perturbing a gene’s activity depends on the activity level of other genes, reflecting the notion that phenotypes are emergent properties of a network of functionally interacting genes. In the context of cancer, contemporary investigations have primarily focused on just one type of functional genetic interaction (GI) – synthetic lethality (SL). However, there may be additional types of GIs whose systematic identification would enrich the molecular and functional characterization of cancer. This thesis describes a novel data-driven approach called EnGIne, that applied to large-scale cancer data identifies 71,946 GIs spanning 12 distinct types, only a small minority of which are SLs. The detected GIs explain cancer driver genes’ tissue- specificity and differences in patients’ response to drugs, and stratify breast cancer tumors into refined subtypes. These results expand the scope of cancer GIs and lay a conceptual and computational basis for future studies of additional types of GIs and their translational applications. Furthermore, tumor growth is continuously shaped by the immune response. However, T cells typically adopt a dysfunctional phenotype may be reversed using immunotherapy strategies. Most current tumor immunotherapies leverage cytotoxic CD8+ T cells to elicit an effective anti-tumor response. Despite evidence for clinical potential of CD4+ tumor-infiltrating lymphocytes (TILs), their functional diversity has limited our ability to harness their anti-tumor activity. To address this issue, we have used single-cell mRNA sequencing (scRNAseq) to analyze the response of CD4+ T cells specific for a defined recombinant tumor antigen, both in the tumor microenvironment and draining lymph nodes (dLN). New computational approaches to characterize subpopulations identified TIL transcriptomic patterns strikingly distinct from those elicited by responses to infection, and dominated by diversity among T-bet-expressing T helper type 1 (Th1)-like cells. In contrast, the dLN response includes Follicular helper (Tfh)-like cells but lacks Th1 cells. We identify an interferon-driven signature in Th1-like TILs, and show that it is found in human liver cancer and melanoma, in which it is negatively associated with response to checkpoint therapy. Our study unveils unsuspected differences between tumor and virus CD4+ T cell responses, and provides a proof-of-concept methodology to characterize tumor- control CD4+ T cell effector programs. Targeting these programs should help improve immunotherapy strategies.Item Computational Metagenomics: Network, Classification and Assembly(2012) Liu, Bo; Pop, Mihai; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Due to the rapid advance of DNA sequencing technologies in recent 10 years, large amounts of short DNA reads can be obtained quickly and cheaply. For example, a single Illumina HiSeq machine can produce several terabytes of data sets within a week. Metagenomics is a new scientific field that involves the analysis of genomic DNA sequences obtained directly from the environment, enabling studies of novel microbial systems. Metagenomics was made possible from high-throughput sequencing technologies. The analysis of the resulting data requires sophisticated computational analyses and data mining. In clinical settings, a fundamental goal of metagenomics is to help people diagnose and cure disease in clinical settings. One major bottleneck so far is how to analyze the huge noisy data sets quickly and precisely. My PhD research focuses on developing algorithms and tools to tackle these challenging and interesting computational problems. From the functional perspective, a metagenomic sample can be represented as a weighted metabolic network, in which the nodes are molecules, edges are enzymes encoded by genes, and the weights can be considered as the number of organisms providing the functions. One goal of functional comparison between metagenomic samples is to find differentially abundant metabolic subnetworks between two groups under comparison. We have developed a statistical network analysis tool - MetaPath, which uses a greedy search algorithm to find maximum weight subnetwork and a nonparametric permutation test to measure the statistical significance. Unlike previous approaches, MetaPath explicitly searches for significant subnetwork in the global network, enabling us to detect signatures at a finer level. In addition, we developed statistical methods that take into account the topology of the network when testing the significance of the subnetworks. Another computational problem involves classifying anonymous DNA sequences obtained from metagenomic samples. There are several challenges here: (1) The classification labels follow a hierarchical tree structure, in which the leaves are most specific, and the internal nodes are more general. How can we classify novel sequences that do not belong to leaf categories (species) but belong to internal groups (e.g., phylum)? (2) For each classification how can we compute a confidence score, such that the users have a tradeoff between sensitivity and specificity? (3) How can we analyze billions of data items quickly? We have developed a novel hierarchical classifier (MetaPhyler) for the classification of anonymous DNA reads. Through simulation, MetaPhyler models the distribution of pairwise similarities within different hierarchical groups with nonparametric density estimation. The confidence score is computed by the ratio of likelihood function. For a query DNA sequence with arbitrary length, its similarity can be calculated through linear approximation. Through benchmark comparison, we have shown that MetaPhyler is significantly faster and more accurate than previous tools. DNA sequencing machines can only produce very short strings (e.g., 100bp) relative to the size of a genome (e.g., a typical bacterial genome is 5Mbp). One of the most challenging computational tasks is the assembly of millions of short reads into longer contigs, which are used as the basis of subsequent computational analyses. In this project, we have developed a comparative metagenomic assembler (MetaCompass), which utilizes the genomes that have already been sequenced previously, and produces long contigs through read mapping (alignment) and assembly. Given the availability of thousands of existing bacteria genomes, for a particular sample, MetaCompass first chooses a best subset as reference based on the taxonomic composition. Then, the reads are aligned against these genomes using MUMmer-map or Bowtie2. Afterwards, we use a greedy algorithm of the minimum set-covering problem to build long contigs, and the consensus sequences are computed by the majority rule. We also propose an iterative approach to improve the performance. Finally, MetaCompass has been successfully evaluated and tested on over 20 terabytes of metagenomic data sets generated from the Human Microbiome Project. In addition, to facilitate the identification and characterization of antibiotic resistance genes, we have created Antibiotic Resistance Genes Database (ARDB), which provides a centralized compendium of information on antibiotic resistance. Furthermore, we have applied our tools to the analysis of a novel oral microbiome data set, and have discovered interesting functional mechanisms and ecological changes underlying the transition from health to periodontal disease of human mouth at a system level.Item Mathematical modeling of drug resistance and cancer stem cells dynamics(2010) Tomasetti, Cristian; Levy, Doron; Dolgopyat, Dmitry; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In this dissertation we consider the dynamics of drug resistance in cancer and the related issue of the dynamics of cancer stem cells. Our focus is only on resistance which is caused by random genetic point mutations. A very simple system of ordinary differential equations allows us to obtain results that are comparable to those found in the literature with one important difference. We show that the amount of resistance that is generated before the beginning of the treatment, and which is present at some given time afterward, always depends on the turnover rate, no matter how many drugs are used. Previous work in the literature indicated no dependence on the turnover rate in the single drug case while a strong dependence in the multi-drug case. We develop a new methodology in order to derive an estimate of the probability of developing resistance to drugs by the time a tumor is diagnosed and the expected number of drug-resistant cells found at detection if resistance is present at detection. Our modeling methodology may be seen as more general than previous approaches, in the sense that at least for the wild-type population we make assumptions only on their averaged behavior (no Markov property for example). Importantly, the heterogeneity of the cancer population is taken into account. Moreover, in the case of chronic myeloid leukemia (CML), which is a cancer of the white blood cells, we are able to infer the preferred mode of division of the hematopoietic cancer stem cells, predicting a large shift from asymmetric division to symmetric renewal. We extend our results by relaxing the assumption on the average growth of the tumor, thus going beyond the standard exponential case, and showing that our results may be a good approximation also for much more general forms of tumor growth models. Finally, after reviewing the basic modeling assumptions and main results found in the mathematical modeling literature on chronic myeloid leukemia (CML), we formulate a new hypothesis on the effects that the drug Imatinib has on leukemic stem cells. Based on this hypothesis, we obtain new insights on the dynamics of the development of drug resistance in CML.