MULTIVARIATE METHODS FOR HIGH-THROUGHPUT BIOLOGICAL DATA WITH APPLICATION TO COMPARATIVE GENOMICS
Publication or External Link
Phenotypic variation in multi-cellular organisms arises as a result complex gene regulation mechanisms. Modern development of high-through technology opens up the possibility of genome-wide interrogation of aspects of these mechanisms across molecular phenotypes. Multivariate statistical methods provide convenient frameworks for modeling and analyzing data obtained from high-throughput experiments probing these complex aspects. This dissertation presents multivariate statistical methods to analyze data arising from two specific high-throughput molecular assays: (1) ribosome footprint profiling experiments, and (2) flow cytometry data.
Ribosome footprint profiling describes an in vivo translation profile in a living cell and offers insights into the process of post-transcriptional gene regulation. Translation efficiency (TE) is a measure that quantifies the rate at which active translation is occurring for each gene – defined as the ratio of ribosome protected fragment count to mRNA fragment count. We introduce pairedSeq, an empirical covariance shrinkage method for differential testing of translation efficiency from sequencing data. The method draws on variance decomposition techniques in mixed-effect modeling and analysis of variance. Benchmark tests comparing to the existing methods reveals that pairedSeq effectively detects signals in genes with high variation in expression measurements across samples due to high co-variability between ribosome occupancy and transcript abundance. In contrast, existing methods tend to mistake genes with negative co-variability as signals, as a result of variance underestimation when not accounting for negative co-variability. We then present a genome-wide survey of primate species divergence at the translational and post-translational layer of gene regulation.
FCM is routinely employed to characterize cellular characteristics such as mRNA and protein expression at the single-cell level. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. FlowMap-FR can be used to quantify the similarity between cell populations under scenarios of proportion differences and modest position shifts, and to identify situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. It has been implemented as a stand-alone R/Bioconductor package easily incorporated into current FCM data analytical workflows.