UMD Theses and Dissertations
Permanent URI for this collectionhttp://hdl.handle.net/1903/3
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a given thesis/dissertation in DRUM.
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
271 results
Search Results
Item DEVELOPMENT AND EVALUATION OF SPATIALLY-EXPLICIT POPULATION MODELS FOR ESTIMATING THE ABUNDANCE OF CHESAPEAKE BAY FISHES(2024) Nehemiah, Samara; Wilberg, Michael J.; Marine-Estuarine-Environmental Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Although fish populations typically experience spatially varying abundance and fishing mortality, stock assessments that inform management decisions commonly model a population that is assumed to be well-mixed with homogenous mortality rates. When assumptions about population mixing are not met, these models can result in biased estimates. Spatial population estimates are particularly beneficial to the Chesapeake Bay because this region faces unique challenges as a result of climate change and fishing pressure. However, use of spatial population models for fisheries management relies on models that can provide more accurate estimates of biological parameters than non-spatial models. Objectives for this research were to 1) develop and implement a multi-stock, spatially-explicit population model for Striped Bass (Morone saxatilis) to estimate abundance and fishing mortality in the Chesapeake Bay and along the Atlantic coast; 2) assess the performance of spatially-explicit models compared to spatially-implicit models (i.e., fleets-as-areas) to estimate abundance, determine how improved data quality (e.g., stock composition) affects model performance, and determine the effect of aging error on model accuracy; and 3) determine how spatial model performance is affected by potential changes in population dynamics resulting from climate change (e.g., time-varying natural mortality). The population model was a two-stock model with two sub-annual time-steps and two regions with stock and age-specific occupancy probabilities representing movement into and out of the Chesapeake Bay. Fishing mortality was estimated to be higher in the Ocean than the Chesapeake Bay, and abundance increased during 1982-2004 for both stocks before declining slightly until 2017. Simulations were conducted to test the ability of models to estimate abundance and fishing mortality under alternative scenarios of data availability and quality. Spatially-explicit estimates were approximately unbiased when they closely matched the assumptions of the data generating model. Models that ignored potential aging bias in datasets resulted in highly biased estimates of abundance and fishing mortality. Although the performance of all models degraded under most climate change scenarios, spatially-explicit models produced the most accurate model estimates compared to fleets-as-areas models. This research highlights the potential benefits of implementing spatially-explicit population models for Striped Bass and ecologically valuable fish species in the Chesapeake Bay.Item Variable selection and causal discovery methods with application in noncoding RNA regulation of gene expression(2024) Ke, Hongjie; Ma, Tianzhou; Mathematics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Noncoding RNAs (ncRNAs), including long noncoding RNAs (lncRNAs), micro RNAs (miRNAs), etc, are critical regulators that control the gene expression at multiple levels. Revealing how the ncRNAs regulate their target genes in disease associated pathways will provide mechanistic insights into the disease and have potential clinical usage. In this dissertation, we developed novel variable selection and causal discovery methods to study the regulatory relationship between ncRNAs and genes. In Chapter 2, we proposed a novel screening method based on robust partial correlation to identify noncoding RNA regulators of gene expression over the whole genome. In Chapter 3, we developed a computationally efficient two-stage Bayesian Network (BN) learning method to construct ncRNA-gene regulatory network from transcriptomic data of both coding genes and noncoding RNAs. We provided a novel analytical platform with a graphical user interface (GUI) which covered the entire pipeline of data preprocessing, network construction, module detection, visualization and downstream analyses to accompany the developed BN learning method. In Chapter 4, we proposed a Bayesian indicator variable selection model with hierarchical structure to uncover how the regulatory mechanism between noncoding RNAs and genes changes over different biological conditions (e.g., cancer stages). In Chapter 5, we discussed about the potential extension and future work. This dissertation presents computationally efficient and statistically rigorous methods that can jointly analyze high-dimensional noncoding RNA and gene expression data to investigate their regulatory relationships, which will deepen our understanding of the molecular mechanism of diseases.Item Advancements in Small Area Estimation Using Hierarchical Bayesian Methods and Complex Survey Data(2024) Das, Soumojit; Lahiri, Partha; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)This dissertation addresses critical gaps in the estimation of multidimensional poverty measures for small areas and proposes innovative hierarchical Bayesian estimation techniques for finite population means in small areas. It also explores specialized applications of these methods for survey response variables with multiple categories. The dissertation presents a comprehensive review of relevant literature and methodologies, highlighting the importance of accurate estimation for evidence-based policymaking. In Chapter \ref{chap:2}, the focus is on the estimation of multidimensional poverty measures for small areas, filling an essential research gap. Using Bayesian methods, the dissertation demonstrates how multidimensional poverty rates and the relative contributions of different dimensions can be estimated for small areas. The proposed approach can be extended to various definitions of multidimensional poverty, including counting or fuzzy set methods. Chapter \ref{chap:3} introduces a novel hierarchical Bayesian estimation procedure for finite population means in small areas, integrating primary survey data with diverse sources, including social media data. The approach incorporates sample weights and factors influencing the outcome variable to reduce sampling informativeness. It demonstrates reduced sensitivity to model misspecifications and diminishes reliance on assumed models, making it versatile for various estimation challenges. In Chapter \ref{chap: 4}, the dissertation explores specialized applications for survey response variables with multiple categories, addressing the impact of biased or informative sampling on assumed models. It proposes methods for accommodating survey weights seamlessly within the modeling and estimation processes, conducting a comparative analysis with Multilevel Regression with Poststratification (MRP). The dissertation concludes by summarizing key findings and contributions from each chapter, emphasizing implications for evidence-based policymaking and outlining future research directions.Item A Mean-Parameterized Conway–Maxwell–Poisson Multilevel Item Response Theory Model for Multivariate Count Response Data(2024) Strazzeri, Marian Mullin; Yang, Ji Seung; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Multivariate count data arise frequently in the process of measuring a latent construct in human development, psychology, medicine, education, and the social sciences. Some examples include the number of different types of mistakes a student makes when reading a passage of text, or the number of nausea, vomiting, diarrhea, and/or dysphagia episodes a patient experiences in a given day. These response data are often sampled from multiple sources and/or in multiple stages, yielding a multilevel data structure with lower level sampling units (e.g., individuals, such as students or patients) nested within higher level sampling units or clusters (e.g., schools, clinical trial sites, studies). Motivated by real data, a new Item Response Theory (IRT) model is developed for the integrative analysis of multivariate count data. The proposed mean-parameterized Conway--Maxwell--Poisson Multilevel IRT (CMPmu-MLIRT) model differs from currently available models in its ability to yield sound inferences when applied to multilevel, multivariate count data, where exposure (the length of time, space, or number of trials over which events are recorded) may vary across individuals, and items may provide different amounts of information about an individual’s level of the latent construct being measured (e.g., level of expressive language development, math ability, disease severity). Estimation feasibility is demonstrated through a Monte Carlo simulation study evaluating parameter recovery across various salient conditions. Mean parameter estimates are shown to be well aligned with true parameter values when a sufficient number of items (e.g., 10) are used, while recovery of dispersion parameters may be challenging when as few as 5 items are used. In a second Monte Carlo simulation study, to demonstrate the need for the proposed CMPmu-MLIRT model over currently available alternatives, the impact of CMPmu-MLIRT model misspecification is evaluated with respect to model parameter estimates and corresponding standard errors. Treating an exposure that varies across individuals as though it were fixed is shown to notably overestimate item intercept and slope estimates, and, when substantial variability in the latent construct exists among clusters, underestimate said variance. Misspecifying the number of levels (i.e., fitting a single-level model to multilevel data) is shown to overestimate item slopes---especially when substantial variability in the latent construct exists among clusters---as well as compound the overestimation of item slopes when a varying exposure is also misspecified as being fixed. Misspecifying the conditional item response distributions as Poisson for underdispersed items and negative binomial for overdispersed items is shown to bias estimates of between-cluster variability in the latent construct. Lastly, the applicability of the proposed CMPmu-MLIRT model to empirical data was demonstrated in the integrative data analysis of oral language samples.Item The Critical Race Framework Study: Standardizing Critical Evaluation for Research Studies That Use Racial Taxonomy(2024) Williams, Christopher M.; Fryer, Craig S.; Public and Community Health; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Introduction: Race is one of the most common variables in public health surveillance and research. Yet, studies involving racial measures show poor conceptual clarity and inconsistent operational definitions. There does not exist a bias tool in the public health literature for structured qualitative evaluation in critical areas of critical appraisal – reliability, validity, internal validity, and external validity – for studies that use racial taxonomy. This study developed the Critical Race (CR) Framework to address a major gap in the literature. Methods: The study involved three iterative phases to answer five research questions (RQs). Phase I was a pilot study of the CR Framework among public health faculty and doctoral students to assess measures of fit (RQ1) and to identify areas of improvement in training, instrumentation, and study design (RQ2). Study participants received training and performed a single article evaluation. Phase II was a national cross-sectional study of public health experts to assess perceptions of the revised training and tool to assess measures of fit (RQ1), to determine the influence of demographic and research factors on perceptions (RQ3), and to gather validity evidence on constructs (RQ4). In Phase III, three raters performed article evaluations to support reliability evidence (RQ4) and to determine the quality of health disparities and behavioral health research studies against the CR Framework (RQ5). Analysis: We assessed the reliability of study results and the CR Framework using non-differentiation analysis, thematic analysis, missingness analysis, user data, measures of internal consistency for adopted instruments, interrater agreement, and interrater reliability. Validity was assessed using content validity (CVI and k*), construct validity, and exploratory factor analyses (EFA). Results: The study recruited 30 highly skilled public health experts across its three phases as part of the final analytic sample. Phase I had poor reliability in which the results could not be confidently interpreted (RQ1) and indicated needed improvement in study design, training, and instrumentation (RQ2). Based on Phase II results, we met or exceeded acceptable thresholds for measures of fit – acceptability, appropriateness, feasibility, and satisfaction (RQ1). Demographic or research factors were not associated with responses (RQ3). Interrater agreement was moderate to high among rater pairs (RQ4). Due to lack of confidence in significance testing, interrater reliability results were inconclusive. Overall data results showed excellent content validity. Based on EFA results, construct validity for reliability and validity items was poor to fair (RQ4). Data results were inconclusive on internal validity and external validity. The twenty studies used in critical appraisal showed low quality or no discussion when the Critical Race Framework was used (RQ5). Discussion: The CR Framework study developed a tool and training with quality evidence for implementation effectiveness, content validity, and interrater reliability to fill a major gap in the public health literature. It contributed an innovative theory-based tool and training to the literature. Future research should seek to study individual perceptions and practices that influence outcomes of CR Framework application and to reduce barriers to ensure that minimum sample sizes can be met for additional testing.Item Essays on Mental Health, Education, and Parental Labor Force Participation(2024) Nesbit, Rachel; Kuersteiner, Guido; Pope, Nolan; Economics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)This dissertation consists of three chapters in empirical microeconomics. The first chapterfocuses on mental health in the criminal justice system. I show that mandated mental health treatment during probation decreases future recidivism and further that paying for these probationers to receive treatment would be a very cost-effective program. The second chapter focuses on the labor supply of same-sex couples. My coauthors and I document the earnings patterns in same-sex couples after the entrance of their first child and contrast them with the earnings patterns in opposite-sex couples. The third chapter evaluates state-level policies to offer a college admissions exam (either the SAT or ACT) free to all high school students. I estimate precise null effects of the policies on future college attendance. The three chapters are described in further detail below. Chapter 1. Mental health disorders are particularly prevalent among those in the criminaljustice system and may be a contributing factor in recidivism. Using North Carolina court cases from 1994 to 2009, this chapter evaluates how mandated mental health treatment as a term of probation impacts the likelihood that individuals return to the criminal justice system. I use random variation in judge assignment to compare those who were required to seek weekly mental health counseling to those who were not. The main findings are that being assigned to seek mental health treatment decreases the likelihood of three-year recidivism by about 12 percentage points, or 36 percent. This effect persists over time, and is similar among various types of individuals on probation. In addition, I show that mental health treatment operates distinctly from drug addiction interventions in a multiple-treatment framework. I provide evidence that mental health treatment’s longer-term effectiveness is strongest among more financially advantaged probationers, consistent with this setting, in which the cost of mandated treatment is shouldered by offenders. Finally, conservative calculations result in a 5:1 benefit-to-cost ratio which suggests that the treatment-induced decrease in future crime would be more than sufficient to offset the costs of treatment. Chapter 2. Existing work has shown that the entry of a child into a household results in alarge and sustained increase in the earnings gap between male and female partners in oppositesex couples. Potential reasons for this include work-life preferences, comparative advantage over earnings, and gender norms. We expand this analysis of the child penalty to examine earnings of individuals in same-sex couples in the U.S. around the time their first child enters the household. Using linked survey and administrative data and event-study methodology, we confirm earlier work finding a child penalty for women in opposite-sex couples. We find this is true even when the female partner is the primary earner pre-parenthood, lending support to the importance of gender norms in opposite-sex couples. By contrast, in both female and male same-sex couples, earnings changes associated with child entry differ by the relative pre-parenthood earnings of the partners: secondary earners see an increase in earnings, while on average the earnings of primary and equal earners remain relatively constant. While this finding seems supportive of a norm related to equality within same-sex couples, transition analysis suggests a more complicated story. Chapter 3. Since 2001, more than half of US states have implemented policies that requireall public high schools to administer either the ACT or SAT to juniors during the school day free of charge, making that aspect of the college application process less costly in both time and money. I evaluate these policies using American Community Surveys (ACS) from 2000 to 2019. I augment ACS data with the Census Master Address File to precisely identify the state in which individuals took the exam. Exploiting variation in policy implementation across state and time, I find across all specifications that increased access to standardized college entrance exams has no effect on subsequent college attendance. It also does not shift students between public and private colleges or between two- and four-year programs. The results of this chapter suggest that, to the extent that these policies were introduced to encourage college-going among marginal students, they did not accomplish their goal. This provides evidence about the kinds of support necessary to influence educational outcomes for students from disadvantaged families.Item Structured discovery in graphs: Recommender systems and temporal graph analysis(2024) Peyman, Sheyda Do'a; Lyzinski, Vince V.; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Graph-valued data arises in numerous diverse scientific fields ranging from sociology, epidemiology and genomics to neuroscience and economics.For example, sociologists have used graphs to examine the roles of user attributes (gender, class, year) at American colleges and universities through the study of Facebook friendship networks and have studied segregation and homophily in social networks; epidemiologists have recently modeled Human-nCov protein-protein interactions via graphs, and neuroscientists have used graphs to model neuronal connectomes. The structure of graphs, including latent features, relationships between the vertex and importance of each vertex are all highly important graph properties that are main aspects of graph analysis/inference. While it is common to imbue nodes and/or edges with implicitly observed numeric or qualitative features, in this work we will consider latent network features that must be estimated from the network topology.The main focus of this text is to find ways of extracting the latent structure in the presence of network anomalies. These anomalies occur in different scenarios: including cases when the graph is subject to an adversarial attack and the anomaly is inhibiting inference, and in the scenario when detecting the anomaly is the key inference task. The former case is explored in the context of vertex nomination information retrieval, where we consider both analytic methods for countering the adversarial noise and also the addition of a user-in-the-loop in the retrieval algorithm to counter potential adversarial noise. In the latter case we use graph embedding methods to discover sequential anomalies in network time series.Item STATISTICAL DATA FUSION WITH DENSITY RATIO MODEL AND EXTENSION TO RESIDUAL COHERENCE(2024) Zhang, Xuze; Kedem, Benjamin; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Nowadays, the statistical analysis of data from diverse sources has become more prevalent. The Density Ratio Model (DRM) is one of the methods for fusing and analyzing such data. The population distributions of different samples can be estimated basedon fused data, which leads to more precise estimates of the probability distributions. These probability distributions are related by assuming the ratios of their probability density functions (PDFs) follow a parametric form. In the previous works, this parametric form is assumed to be uniform for all ratios. In Chapter 1, an extension is made to allow this parametric form to vary for different ratios. Two methods of determining the parametric form for each ratio are developed based on asymptotic test and Akaike Information Criterion (AIC). This extended DRM is applied to Radon concentration and Pertussis rates to demonstrate the use of this extension in univariate case and multivariate case, respectively. The above analysis is made possible when data in each sample are independent and identically distributed (IID). However, in many cases, statistical analysis is entailed for time series in which data appear to be sequentially dependent. In Chapter 2, an extension is made for DRM to account for weakly dependent data, which allows us to investigate the structure of multiple time series on the strength of each other. It is shown that the IID assumption can be replaced by proper stationarity, mixing and moment conditions. This extended DRM is applied to the analysis of air quality data which are recorded in chronological order. As mentioned above, DRM is suitable for the situation where we investigate a single time series based on multiple alternative ones. These time series are assumed to be mutually independent. However, in time series analysis, it is often of interest to detect linear and nonlinear dependence between different time series. In such dependent scenario, coherence is a common tool to measure the linear dependence between two time series, and residual coherence is used to detect a possible quadratic relationship. In Chapter 3, we extend the notion of residual coherence and develop statistical tests for detecting linear and nonlinear associations between time series. These tests are applied to the analysis of brain functional connectivity data.Item The Shuffling Effect: Vertex Label Error’s Impact on Hypothesis Testing, Classification, and Clustering in Graph Data(2024) Saxena, Ayushi; Lyzinski, Vince; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The increasing prevalence of graph and network-valued data across various disciplines has prompted significant interest and research in recent years. This dissertation explores the impact of vertex shuffling, or vertex misalignment, on the statistical network inference tasks of hypothesis testing, classification, and clustering. Our focus is within the framework of multiple network inference, where existing methodologies often assume known vertex correspondence across networks. This assumption frequently does not hold in practice. Through theoretical analyses, simulations, and experiments, we aim to reveal the effects of vertex shuffling on different types of performance.Our investigation begins with an examination of two-sample network hypothesis testing, focusing on the decrease in statistical power resulting from vertex shuffling. In this work, our analysis focuses on the random dot product and stochastic block model network settings. Subsequent chapters delve into the effects of shuffling on graph classification and clustering, showcasing how misalignment negatively impacts accuracy in categorizing and clustering graphs (and vertices) based on their structural characteristics. Various machine learning algorithms and clustering methodologies are explored, revealing a theme of consistent performance degradation in the presence of vertex shuffling. We also explore how graph matching algorithms can potentially mitigate the effects of vertex misalignment and recover the lost performance. Our findings also highlight the risk of graph matching as a pre-processing tool, as it can induce artificial signal. These findings highlight the difficulties and subtleties of addressing vertex shuffling across multiple network inference tasks and suggest avenues for future research in order to enhance the robustness of statistical inference methodologies in complex network environments.Item DEVELOPMENT AND APPLICATION OF PROPINQUITY MODELING FRAMEWORK FOR IDENTIFICATION AND ANALYSIS OF EXTREME EVENT PATTERNS(2024) kholodovsky, vitaly; Liang, Xin-Zhong; Atmospheric and Oceanic Sciences; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Extreme weather and climate events such as floods, droughts, and heat waves can cause extensive societal damage. While various statistical and climate models have been developed for the purpose of simulating extremes, a consistent definition of extreme events is still lacking. Furthermore, to better assess the performance of the climate models, a variety of spatial forecast verification measures have been developed. However, in most cases, the spatial verification measures that are widely used to compare mean states do not have sufficient theoretical justification to benchmark extreme events. In order to alleviate inconsistencies when defining extreme events within different scientific communities, we propose a new generalized Spatio-Temporal Threshold Clustering method for the identification of extreme event episodes, which uses machine learning techniques to couple existing pattern recognition indices with high or low threshold choices. The method consists of five main steps: construction of essential field quantities, dimension reduction, spatial domain mapping, time series clustering, and threshold selection. We develop and apply this method using a gridded daily precipitation dataset derived from rain gauge stations over the contiguous United States. We observe changes in the distribution of conditional frequency of extreme precipitation from large-scale, well-connected spatial patterns to smaller-scale, more isolated rainfall clusters, possibly leading to more localized droughts and heatwaves, especially during the summer months. Additionally, we compare empirical and statistical probabilities and intensities obtained through the Conventional Location Specific methods, which are deficient in geometric interconnectivity between individual spatial pixels and independent in time, with a new Propinquity modeling framework. We integrate the Spatio-Temporal Threshold Clustering algorithm and the conditional semi-parametric Heffernan and Tawn (2004) model into the Propinquity modeling framework to separate classes of models that can calculate process level dependence of large-scale extreme processes, primarily through the overall extreme spatial field. Our findings reveal significant differences between Propinquity and Conventional Location Specific methods, in both empirical and statistical approaches in shape and trend direction. We also find that the process of aggregating model results without considering interconnectivity between individual grid cells for trend construction can lead to significant variations in the overall trend pattern and direction compared with models that do account for interconnectivity. Based on these results, we recommend avoiding such practices and instead adopting the Propinquity modeling framework or other spatial EVA models that take into account the interconnectivity between individual grid cells. Our aim for the final application is to establish a connection between extreme essential field quantity intensity fields and large-scale circulation patterns. However, the Conventional Location Specific Threshold methods are not appropriate for this purpose as they are memoryless in time and not able to identify individual extreme episodes. To overcome this, we developed the Feature Finding Decomposition algorithm and used it in combination with the Propinquity modeling framework. The algorithm consists of the following three steps: feature finding, image decomposition, and large-scale circulation patterns connection. Our findings suggest that the Western Pacific Index, particularly its 5th percentile and 5th mode of decomposition, is the most significant teleconnection pattern that explains the variation in the trend pattern of the largest feature intensity.