NEW STATISTICAL METHODS FOR HIGH-DIMENSIONAL DATA WITH COMPLEX STRUCTURES
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
The overwhelming advances in biomedical technology facilitate the availability of high-dimensional biomedical data with complex and organized structures. However, due to the obscured true signals by substantial false-positive noises and the high dimensionality, the statistical inference is challenging with the critical issue of research reproducibility and replicability. Hence, motivated by these urgent needs, this dissertation is devoted to statistical approaches in understanding the latent structures among biomedical objects, as well as improving statistical power and reducing false-positive errors in statistical inference.
The first objective of this dissertation is motivated by the group-level brain connectome analysis in neuropsychiatric research with the goal of exhibiting the connectivity abnormality between clinical groups. In Chapter 2, we develop a likelihood-based adaptive dense subgraph discovery (ADSD) procedure to identify connectomic subnetworks (subgraphs) that are systematically associated with brain disorders. We propose the statistical inference procedure leveraging graph properties and combinatorics. We validate the proposed method by a brain fMRI study for schizophrenia research and synthetic data under various settings.
In Chapter 3, we are interested in assessing the genetic effects on brain structural imaging with spatial specificity. In contrast to the inference on individual SNP-voxel pairs, we focus on the systematic associations between genetic and imaging measurements, which assists the understanding of a polygenic and pleiotropic association structure. Based on voxel-wise genome-wide association analysis (vGWAS), we characterize the polygenic and pleiotropic SNP-voxel association structure using imaging-genetics dense bi-cliques (IGDBs). We develop the estimation procedure and statistical inference framework on the IGDBs with computationally efficient algorithms. We demonstrate the performance of the proposed approach using imaging-genetics data from the human connectome project (HCP).
Chapter 4 carries the analysis of gene co-expression network (GCN) in examining the gene-gene interactions and learning the underlying complex yet highly organized gene regulatory mechanisms. We propose the interconnected community network (ICN) structure that allows the interactions between genes from different communities, which relaxes the constraint of most existing GCN analysis approaches. We develop a computational package to detect the ICN structure based on graph norm shrinkage. The application of ICN detection is illustrated using an RNA-seq data from The Cancer Genome Atlas (TCGA) Acute Myeloid Leukemia (AML) study.