UMD Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/3

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a given thesis/dissertation in DRUM.

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 10 of 34
  • Thumbnail Image
    Item
    Application of Causal Inference in Large-Scale Biomedical Data
    (2024) Zhao, Zhiwei; Chen, Shuo; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation contains three projects that tackle the challenges in the application of causal inference on large-scale biomedical data. Project 1 proposes a novel mediation analysis framework with the existence of multiple mediators and outcomes. It can extract the mediation pathway efficiently and estimate the mediation effect from multiple mediators simultaneously. The effectiveness of the proposed method is validated through extensive simulation and a real data application focusing on human connectome study. Project 2 introduces a doubly machine learning based method, assisted by algorithm ensemble, for estimating longitudinal causal effects. This approach reduces estimation bias and accommodates high-dimensional covariates. The validity of the proposed method is justified by simulation studies and an application to adolescent brain cognitive development data, specifically evaluating the impact from sleep insufficiency on youth cognitive development. Project 3 develops a new bias-reduction estimation that addresses unmeasured confounding by leveraging proximal learning and negative control outcome techniques. This method can handle a moderate number of exposures and multivariate outcomes in the presence of unmeasured confounders. Both numerical experiment and data application using UK Biobank demonstrate that the proposed method effectively reduces estimation bias caused by unmeasured confounding. Collectively, these three projects introduce innovative methodologies for causal inference in neuroimaging, advancing mediation analysis in neuroimaging, improving longitudinal causal effect estimation, and reducing estimation bias in the presence of unmeasured confounding.
  • Thumbnail Image
    Item
    Variable selection and causal discovery methods with application in noncoding RNA regulation of gene expression
    (2024) Ke, Hongjie; Ma, Tianzhou; Mathematics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Noncoding RNAs (ncRNAs), including long noncoding RNAs (lncRNAs), micro RNAs (miRNAs), etc, are critical regulators that control the gene expression at multiple levels. Revealing how the ncRNAs regulate their target genes in disease associated pathways will provide mechanistic insights into the disease and have potential clinical usage. In this dissertation, we developed novel variable selection and causal discovery methods to study the regulatory relationship between ncRNAs and genes. In Chapter 2, we proposed a novel screening method based on robust partial correlation to identify noncoding RNA regulators of gene expression over the whole genome. In Chapter 3, we developed a computationally efficient two-stage Bayesian Network (BN) learning method to construct ncRNA-gene regulatory network from transcriptomic data of both coding genes and noncoding RNAs. We provided a novel analytical platform with a graphical user interface (GUI) which covered the entire pipeline of data preprocessing, network construction, module detection, visualization and downstream analyses to accompany the developed BN learning method. In Chapter 4, we proposed a Bayesian indicator variable selection model with hierarchical structure to uncover how the regulatory mechanism between noncoding RNAs and genes changes over different biological conditions (e.g., cancer stages). In Chapter 5, we discussed about the potential extension and future work. This dissertation presents computationally efficient and statistically rigorous methods that can jointly analyze high-dimensional noncoding RNA and gene expression data to investigate their regulatory relationships, which will deepen our understanding of the molecular mechanism of diseases.
  • Thumbnail Image
    Item
    COVID-19 Vaccine Hesitancy and Uptake in the United States Considered Through the Lens of Health Behavior Theory
    (2024) Kauffman, Lauren Emily; Nguyen, Quynh; Epidemiology and Biostatistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Given the low COVID-19 vaccine uptake rates in many areas of the United States despite their demonstrated safety and effectiveness, COVID-19 vaccine hesitancy and vaccination barriers continue to be critical areas of research in epidemiology and behavioral health science. This series of studies focuses on COVID-19 vaccine hesitancy and vaccination barriers, as they relate to vaccination intention and vaccine uptake, considered in the context of established health behavior theories. The first study is a systematic review of existing research on COVID-19 vaccine hesitancy using one or more health behavior theories as key components of the design or analysis. This study examined the types of theories that are most often used, how they are used, and where research gaps exist. The remaining two studies use data from the U.S. COVID-19 Trends and Impact Survey, a national cross-sectional survey. The second study investigates the association between recent feelings of anxiety or depression and vaccination intention, as well as between these feelings and identifying with specific vaccine hesitancy reasons. The third study examines vaccine hesitancy and barriers among those with chronic illness or disease, a particularly vulnerable population. Factor analysis was conducted using constructs from the Theory of Planned Behavior as a framework, and the results were used in a regression model to investigate the association between these underlying factors and vaccination intention. This research demonstrated the usefulness of the Theory of Planned Behavior, the Health Belief Model, and the 3 Cs Model in existing and future COVID-19 vaccine hesitancy research, as well as identified Protection Motivation Theory as a promising area for future research. Additionally, psychological states were demonstrated to be significantly associated with vaccine hesitancy, adjusting for demographic, socioeconomic, and time factors. Lastly, the Theory of Planned Behavior was found to be applicable to those unvaccinated and with chronic illness, as the construct factor scores developed were significantly associated with vaccine hesitancy (adjusting for the presence of specific chronic conditions and demographic, socioeconomic, and time factors). These associations were also consistently demonstrated in subgroup analyses of participants with specific chronic conditions.
  • Item
    EVALUATING THE EFFECTS OF MODIFIABLE LIFESTYLE AND CARDIOVASCULAR HEALTH FACTORS ON DIABETES LIFE EXPECTANCY IN NHANES AND BRAIN AGING IN UK BIOBANK
    (2024) Feng, Li; Lei, David K.Y. DL; Ma, Tianzhou TM; Nutrition; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation explored the impact of lifestyle and cardiovascular health factors on aging, particularly focusing on individuals with diabetes, the effects of blood pressure on brain aging, and the influence of cardiovascular health and genetic predispositions on brain white matter aging.The first study examined the trends in lifestyle quality among US adults with type 2 diabetes from 1999 to 2018 using NHANES data, involving 7,410 participants. A healthy lifestyle score encompassing smoking, drinking, physical activity, and diet showed a slight increased over the years. Notably, disparities remained significant by socioeconomic groups. The study found that adherence to low-risk lifestyle factors was associated with a 55%-57% lower risk of all-cause mortality, emphasizing the importance of lifestyle modification in diabetes management, and it was independent of cardiovascular risk control. The second study investigated the causal effect of elevated blood pressure on white matter brain aging in a cohort of 228,473 European ancestries aged 40-69 from the UK Biobank by using two-sample Mendelian randomization. Our result revealed that high blood pressure, particularly diastolic, accelerated the machine-learning-derived white matter brain age gap, based on white matter microstructure integrity measured by fractional anisotropy derived from diffusion tensor imaging data, with a causal effect evidence found in late middle-aged women. This underscores the importance of blood pressure control in preventing brain aging, especially in post-menopausal women. Lastly, the impact of Life's Essential 8 (LE8), a comprehensive measure of cardiovascular health (lifestyle part: diet, smoke, physical activity, sleep; health part: BMI, blood sugar, blood pressure, blood lipid), on white matter brain aging was assessed, with a particular focus on how the APOE4 genotype modifies the relationship. Analyzing data from 18,817 European ancestries aged 40-60 from the UK Biobank, the study revealed that higher LE8 scores correlated with a younger brain age. Interestingly, the effect varied significantly with APOE4 status, highlighting the need for personalized health strategies based on genetic profiles. In conclusion, these studies collectively highlight the crucial role of modifiable lifestyle and health factors in managing chronic diseases, controlling blood pressure, and maintaining brain health, with an emphasis on the integration of genetic profiles for personalized healthcare.
  • Thumbnail Image
    Item
    DISSECTING TUMOR CLONALITY IN LIVER CANCER: A PHYLOGENY ANALYSIS USING COMPUTATIONAL AND STATISTICAL TOOLS
    (2023) Kacar, Zeynep; Slud, Eric ES; Levy, Doron DL; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Liver cancer is a heterogeneous disease characterized by extensive genetic and clonaldiversity. Understanding the clonal evolution of liver tumors is crucial for developing effective treatment strategies. This dissertation aims to dissect the tumor clonality in liver cancer using computational and statistical tools, with a focus on phylogenetic analysis. Through advancements in defining and assessing phylogenetic clusters, we gain a deeper understanding of the survival disparities and clonal evolution within liver tumors, which can inform the development of tailored treatment strategies and improve patient outcomes. The thesis begins by providing an overview of sources of heterogeneity in liver cancer and data types, from Whole-Exome (WEX) and RNA sequencing (RNA-seq) read-counts by gene to derived quantities such as Copy Number Alterations (CNAs) and Single Nucleotide Variants (SNVs). Various tools for deriving copy-numbers are discussed and compared. Additionally, comparison of survival distributions is discussed. The central data analyses of the thesis concern the derivation of distinct clones and clustered phylogeny types from the basic genomic data in three independent cancer cohorts, TCGA-LIHC, TIGER-LC and NCI-MONGOLIA. The SMASH (Subclone multiplicity allocation and somatic heterogeneity) algorithm is introduced for clonality analysis, followed by a discussion on clustering analysis of nonlinear tumor evolution trees and the construction of phylogenetic trees for liver cancer cohorts. Identification of drivers of tumor evolution, and the immune cell micro-environment of tumors are also explored. In this research, we employ survival analysis tools to investigate and document survival differences between groups of subjects defined from phylogenetic clusters. Specifically, we introduce the log-rank test and its modifications for generic right-censored survival data, which we then apply to survival follow-up data for the subjects in the studied cohorts, clustered based on their genomic data. The final chapter of this thesis takes a significant step forward by extending an existing methodology for covariate-adjustment in the two-sample log-rank test to a K-sample scenario, with a specific focus on the already defined phylogeny cluster groups. This extension is not straightforward because the computation of the test statistic for K-sample and its asymptotic null distribution do not follow directly from the two-sample case. Using these extended tools, we conduct an illustrative data analysis with real data from the TIGER-LC cohort, which comprises subjects with analyzed and clustered genomic data, leading to defined phylogenetic clusters associated with two different types of liver cancer. By applying the extended methodology to this dataset, we aim to effectively assess and validate the survival curves of the defined clusters.
  • Thumbnail Image
    Item
    Statistical Network Analysis of High-Dimensional Neuroimaging Data With Complex Topological Structures
    (2023) Lu, Tong; Chen, Shuo SC; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    This dissertation contains three projects that collectively tackle statistical challenges in the field of high-dimensional brain connectome data analysis and enhance our understanding of the intricate workings of the human brain. Project 1 proposes a novel network method for detecting brain-disease-related alterations in voxel-pair-level brain functional connectivity with spatial constraints, thus improving spatial specificity and sensitivity. Its effectiveness is validated through extensive simulations and real data applications in nicotine addiction and schizophrenia studies. Project 2 introduces a multivariate multiple imputation method specifically designed for voxel-level neuroimaging data in high dimensions based on Bayesian models and Markov chain Monte Carlo processes. According to both synthetic data and real neurovascular water exchange data extracted from a neuroimaging dataset in a schizophrenia study, our method indicates high imputation accuracy and computational efficiency. Project 3 develops a multi-level network model based on graph combinatorics that captures vector-to-matrix associations between brain structural imaging measures and functional connectomic networks. The validity of the proposed model is justified through extensive simulations and a real structure-function imaging dataset from UK Biobank. These three projects contribute innovative methodologies and insights that advance neuroimaging data analysis, including improvements in spatial specificity, statistical power, imputation accuracy, and computational efficiency when revealing the brain’s complex neurological patterns.
  • Thumbnail Image
    Item
    Semiparametric Analysis of Multivariate Panel Count Data with an Informative Observation Process
    (2023) Chen, Chang; He, Xin XH; Mathematics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Panel count data and recurrent event data often arise in event history studies. Unlike recurrent event data which are collected from studies that monitor subjects continuously, panel count data are encountered when subjects are observed only at discrete time points. In such case, the exact occurrence times of the events are unknown, but only the numbers of occurrences of the events between subsequent observation time points are recorded. Statistical analysis of panel count data have been studied based on two stochastic processes: an observation process and a response process that characterizes the occurrences of the events of interest.The first part of the dissertation will present a likelihood-based joint modeling procedure for the regression analysis of univariate panel count data with dependent observation equations and time processes. The inference procedure involves estimating equations and an EM algorithm for the estimation of all involved parameters. In the second part, we will extend the proposed methods to multivariate panel count data, which occurs when a recurrent event study involves several related types of recurrent events. In particular, we will present three types of multivariate modeling scenarios and the corresponding inference procedures. A model checking procedure is developed for the proposed univariate models and all three types of multivariate models. Simulation studies indicate that the proposed inference procedures have a good and consistent performance across various situations. The proposed methods are applied to a skin cancer study with bivariate panel count data on the occurrences of two types of related non-melanoma skin cancers.
  • Thumbnail Image
    Item
    UNDERSTANDING HONEY BEE COLONY MORBIDITY AND MORTALITY THROUGH PHYSIOLOGY AND LIFESPAN
    (2022) Nearman, Anthony James; vanEngelsdorp, Dennis; Entomology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Managed honey bee colonies (Apis mellifera) are a critical component of our agroecosystem. As such, we need to understand and address risk factors that contribute to colony loss. Fundamental to this understanding is a need to detail the connection between individual bee’s physiology, life histories, and colony fitness. In this dissertation I first present an in-depth review of honey bee physiologies important for colony success (Chapter 1); I then examine standard methods for rearing honey bees in a laboratory setting and the importance of individual bee lifespan on colony loss (Chapter 2); followed by identification of honey bee physiologies that relate to chronological age as a means of measuring colony demographics and health (Chapter 3); and then apply potential age- and disease-related physiology measures to determine associations with overwinter colony loss and known and unknown risk factor exposure (Chapter 4). Research indicates honey bee colony loss is largely driven by poor nutrition, pesticide exposure, and parasites and the viruses they vector. Management practices and techniques to mitigate the effects of these risk factors decrease loss rates but do not prevent all of them. New knowledge, therefore, is needed to address the gap in knowledge between risk exposure and colony mortality. As a honey bee colony is a complex interaction between multiple groups of individual bees, collective physiological changes among these groups hold promise for understanding why some colonies die while other do not when exposed to the same risk factors. In one experiment (Chapter 2), I demonstrate the importance of access to water on honey bee lifespan. In a literature review informed by the data obtained from these experiments, I discovered that the median lifespan of laboratory specimen has decreased by half over the past 50 years and that this change is predictive of overwinter loss rates reported by beekeepers since 2006. If the age of individual bees can affect the lifespan of a colony, I posited that physiological measures predictive of individual bee age could be useful to ascertain the demographics of a colony’s population, which would in turn be a measure of colony health. To test this hypothesis, I built upon previous physiology studies and examined age-linked cohorts of bees through the fall transition to overwinter. In doing so I derived a set of easily identifiable physiological measures either predictive of individual bee age or a possible unidentified disease state. I then applied these measures to a retrospective cohort study, where I was able to determine that changes in the prevalence among several physiologies were associated with overwinter mortality and known risk factor exposure. These methodologies and results show promise for the use of physiological measures as a potential pragmatic tool to predict colony survivorship, to diagnose past known and unknown risk factor exposures, and to further advance fundamental knowledge of the role demographics play in societal health.
  • Thumbnail Image
    Item
    Causal Survival Analysis – Machine Learning Assisted Models: Structural Nested Accelerated Failure Time Model and Threshold Regression
    (2022) Chen, Yiming; Lee, Mei-Ling ML; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Time-varying confounding for intervention complicates causal survival analysis when the data are collected in a longitudinal manner. Traditional survival models that only adjust for time-dependent covariates provide a biased causal conclusion for the intervention effect. Some techniques have been developed to address this challenge. Nevertheless, these existing methods may still lack power, and suffer from computational burden given high dimensional data with a temporally connected nature. The first part of this dissertation focuses on one of the methods that deal with time-varying confounding, the Structural Nested Model and associated G-estimation. Two Neural Networks (GE-SCORE and GE-MIMIC) were proposed to estimate the Structural Nested Accelerated Failure Time Model. The proposed algorithms can provide less biased and individualized intervention causal effect estimation. The second part explored the causal interpretations and applications of the First-Hitting-Time based Threshold Regression Model using a Wiener process. Moreover, a Neural Network expansion of this specific type of Threshold Regression (TRNN) was explored for the first time.
  • Thumbnail Image
    Item
    Bayesian Methods and Their Application in Neuroimaging Data
    (2022) Ge, Yunjiang; Kedem, Benjamin; Chen, Shuo; Mathematics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The functional magnetic resonance imaging (fMRI) technique is widely used in the medical field because it allows the in vivo investigations of human cognition, emotions, and behaviors at the neural level. One primary objective is to study brain activation, which can be achieved through a conventional two-stage approach. We consider the individualized voxel-specific modeling in the first stage and group-level inference in the second stage. Existing methods, in general, rely on pre-determined parameters or domain knowledge, which may not properly incorporate the unique features from different studies or cohorts, and thus also leads to some gaps in the inference for activated regions. This dissertation focuses on Bayesian approaches to fill the gaps in statistical inference at all levels, as well as accounting for the various information carried out by the data. Cluster-wise statistical inference is the most widely used technique for fMRI data analyses. It consists of two steps: i) primary thresholding that excludes less significant voxels by a pre-specified cut-off (e.g., p<0.001); and ii) cluster-wise thresholding that is often obtained by counting the number of intra-cluster voxels which surpass a voxel-level statistical significance threshold. The selection of the primary threshold is critical because it determines both statistical power and false discovery rate. However, in most existing statistical packages, the primary threshold is selected based on prior knowledge (e.g., p<0.001) without considering the information in the data. Thus, in the first project, we propose a data-driven approach to algorithmically select the optimal primary threshold based on an empirical Bayes framework. We evaluate the proposed model using extensive simulation studies and real fMRI data. In the simulation, we show that our method can effectively increase statistical power while controlling the false discovery rate. We then investigate the brain response to the dose effect of chlorpromazine in patients with schizophrenia by analyzing fMRI scans and generating consistent results. In Chapter 3, we focus on controlling the FWER by conducting cluster-level inference. The cluster-extent measure can be sub-optimal regarding the power and false positive error rate because the supra-threshold voxel count neglects the voxel-wise significance levels and ignores the dependence between voxels. Based on the information that a cluster carries, we provide a new Integrated Cluster-wise significance Measure (ICM) for cluster-level significance determination in cluster-wise fMRI analysis by integrating cluster extent, voxel-level significance (e.g., p-values), and activation dependence between within-cluster voxels. We develop a computationally efficient strategy for ICM based on probabilistic approximation theories. Consequently, the computational load for ICM-based cluster-wise inference (e.g., permutation tests) is affordable. We validate the proposed method via extensive simulations and then apply it to two fMRI data sets. The results demonstrate that ICM can improve power with well-controlled FWER. The above chapters focus on the cluster-extent thresholding method, while the Bayesian hierarchical model can also efficiently handle high-dimensional neuroimaging data. Existing methods provide voxel-specific and pre-determined regional (region of interest (ROI)) inference. However, the activation clusters may be across multiple ROIs or vary from studies and study cohorts. To provide the inference and build the bridge between voxels, unknown activation clusters, targeted regions, and the whole brain, we propose the Dirichlet Process Mixture model with Spatial Constraint (DPMSC) in Chapter 4. The spatial constraint is based on the Euclidean distance between two voxels in the brain space. With such a constraint added at each iteration in Markov Chain Monte Carlo (MCMC), our DPMSC can efficiently remove the single voxel or small noise clusters, as well as provide a whole contiguous cluster that belongs to the same component in the mixture model. Finally, we provide a real data example and simulation studies based on various dataset features.