Mathematics

Permanent URI for this communityhttp://hdl.handle.net/1903/2261

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    Item
    Simultaneous transcriptional profiling of Leishmania major and its murine macrophage host cell reveals insights into host-pathogen interactions
    (Springer Nature, 2015-12-29) Dillon, Laura A. L.; Suresh, Rahul; Okrah, Kwame; Corrada Bravo, Hector; Mosser, David M.; El-Sayed, Najib M.
    Parasites of the genus Leishmania are the causative agents of leishmaniasis, a group of diseases that range in manifestations from skin lesions to fatal visceral disease. The life cycle of Leishmania parasites is split between its insect vector and its mammalian host, where it resides primarily inside of macrophages. Once intracellular, Leishmania parasites must evade or deactivate the host's innate and adaptive immune responses in order to survive and replicate. We performed transcriptome profiling using RNA-seq to simultaneously identify global changes in murine macrophage and L. major gene expression as the parasite entered and persisted within murine macrophages during the first 72 h of an infection. Differential gene expression, pathway, and gene ontology analyses enabled us to identify modulations in host and parasite responses during an infection. The most substantial and dynamic gene expression responses by both macrophage and parasite were observed during early infection. Murine genes related to both pro- and anti-inflammatory immune responses and glycolysis were substantially upregulated and genes related to lipid metabolism, biogenesis, and Fc gamma receptor-mediated phagocytosis were downregulated. Upregulated parasite genes included those aimed at mitigating the effects of an oxidative response by the host immune system while downregulated genes were related to translation, cell signaling, fatty acid biosynthesis, and flagellum structure. The gene expression patterns identified in this work yield signatures that characterize multiple developmental stages of L. major parasites and the coordinated response of Leishmania-infected macrophages in the real-time setting of a dual biological system. This comprehensive dataset offers a clearer and more sensitive picture of the interplay between host and parasite during intracellular infection, providing additional insights into how pathogens are able to evade host defenses and modulate the biological functions of the cell in order to survive in the mammalian environment.
  • Thumbnail Image
    Item
    Analysis and correction of compositional bias in sparse sequencing count data
    (Springer Nature, 2018-11-06) Kumar, M. Senthil; Slud, Eric V.; Okrah, Kwame; Hicks, Stephanie C.; Hannenhalli, Sridhar; Bravo, Héctor Corrada
    Count data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size. We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it.
  • Thumbnail Image
    Item
    Shape Analysis of High-throughput Genomics Data
    (2015) Okrah, Kwame; Corrada Bravo, Hector; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    RNA sequencing refers to the use of next-generation sequencing technologies to characterize the identity and abundance of target RNA species in a biological sample of interest. The recent improvement and reduction in the cost of next-generation sequencing technologies have been paralleled by the development of statistical methodologies to analyze the data they produce. Coupled with the reduction in cost is the increase in the complexity of experiments. Some of the old challenges still remain. For example the issue of normalization is important now more than ever. Some of the crude assumptions made in the early stages of RNA sequencing data analysis were necessary since the technology was new and untested, the number of replicates were small, and the experiments were relatively simple. One of the many uses of RNA sequencing experiments is the identification of genes whose abundance levels are significantly different across various biological conditions of interest. Several methods have been developed to answer this question. Some of these newly developed methods are based on the assumption that the data observed or a transformation of the data are relatively symmetric with light tails, usually summarized by assuming a Gaussian random component. It is indeed very difficult to assess this assumption for small sample sizes (e.g. sample sizes in the range of 4 to 30). In this dissertation, we utilize L-moments statistics as the basis for normalization, exploratory data analysis, the assessment of distributional assumptions, and the hypothesis testing of high-throughput transcriptomic data. In particular, we introduce a new normalization method for high-throughput transcriptomic data that is a modification of quantile normalization. We use L-moments ratios for assessing the shape (skewness and kurtosis statistics) of high-throughput transcriptome data. Based on these statistics, we propose a test for assessing whether the shapes of the observed samples differ across biological conditions. We also illustrate the utility of this framework to characterize the robustness of distributional assumptions made by statistical methods for differential expression. We apply it to RNA-seq data and find that methods based on the simple t-test for differential expression analysis using L-moments statistics as weights are robust. Finally we provide an algorithm based on L-moments ratios for identifying genes with distributions that are markedly different from the majority in the data.