Skip to content
University of Maryland LibrariesDigital Repository at the University of Maryland
    • Login
    View Item 
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    •   DRUM
    • Theses and Dissertations from UMD
    • UMD Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Shape Analysis of High-throughput Genomics Data

    Thumbnail
    View/Open
    Okrah_umd_0117E_16368.pdf (16.43Mb)
    No. of downloads: 103

    Date
    2015
    Author
    Okrah, Kwame
    Advisor
    Corrada Bravo, Hector
    DRUM DOI
    https://doi.org/10.13016/M2T93B
    Metadata
    Show full item record
    Abstract
    RNA sequencing refers to the use of next-generation sequencing technologies to characterize the identity and abundance of target RNA species in a biological sample of interest. The recent improvement and reduction in the cost of next-generation sequencing technologies have been paralleled by the development of statistical methodologies to analyze the data they produce. Coupled with the reduction in cost is the increase in the complexity of experiments. Some of the old challenges still remain. For example the issue of normalization is important now more than ever. Some of the crude assumptions made in the early stages of RNA sequencing data analysis were necessary since the technology was new and untested, the number of replicates were small, and the experiments were relatively simple. One of the many uses of RNA sequencing experiments is the identification of genes whose abundance levels are significantly different across various biological conditions of interest. Several methods have been developed to answer this question. Some of these newly developed methods are based on the assumption that the data observed or a transformation of the data are relatively symmetric with light tails, usually summarized by assuming a Gaussian random component. It is indeed very difficult to assess this assumption for small sample sizes (e.g. sample sizes in the range of 4 to 30). In this dissertation, we utilize L-moments statistics as the basis for normalization, exploratory data analysis, the assessment of distributional assumptions, and the hypothesis testing of high-throughput transcriptomic data. In particular, we introduce a new normalization method for high-throughput transcriptomic data that is a modification of quantile normalization. We use L-moments ratios for assessing the shape (skewness and kurtosis statistics) of high-throughput transcriptome data. Based on these statistics, we propose a test for assessing whether the shapes of the observed samples differ across biological conditions. We also illustrate the utility of this framework to characterize the robustness of distributional assumptions made by statistical methods for differential expression. We apply it to RNA-seq data and find that methods based on the simple t-test for differential expression analysis using L-moments statistics as weights are robust. Finally we provide an algorithm based on L-moments ratios for identifying genes with distributions that are markedly different from the majority in the data.
    URI
    http://hdl.handle.net/1903/16941
    Collections
    • Computer Science Theses and Dissertations
    • Mathematics Theses and Dissertations
    • UMD Theses and Dissertations

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility
     

     

    Browse

    All of DRUMCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister
    Pages
    About DRUMAbout Download Statistics

    DRUM is brought to you by the University of Maryland Libraries
    University of Maryland, College Park, MD 20742-7011 (301)314-1328.
    Please send us your comments.
    Web Accessibility