Theses and Dissertations from UMD
Permanent URI for this communityhttp://hdl.handle.net/1903/2
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
5 results
Search Results
Item Advancements in Small Area Estimation Using Hierarchical Bayesian Methods and Complex Survey Data(2024) Das, Soumojit; Lahiri, Partha; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)This dissertation addresses critical gaps in the estimation of multidimensional poverty measures for small areas and proposes innovative hierarchical Bayesian estimation techniques for finite population means in small areas. It also explores specialized applications of these methods for survey response variables with multiple categories. The dissertation presents a comprehensive review of relevant literature and methodologies, highlighting the importance of accurate estimation for evidence-based policymaking. In Chapter \ref{chap:2}, the focus is on the estimation of multidimensional poverty measures for small areas, filling an essential research gap. Using Bayesian methods, the dissertation demonstrates how multidimensional poverty rates and the relative contributions of different dimensions can be estimated for small areas. The proposed approach can be extended to various definitions of multidimensional poverty, including counting or fuzzy set methods. Chapter \ref{chap:3} introduces a novel hierarchical Bayesian estimation procedure for finite population means in small areas, integrating primary survey data with diverse sources, including social media data. The approach incorporates sample weights and factors influencing the outcome variable to reduce sampling informativeness. It demonstrates reduced sensitivity to model misspecifications and diminishes reliance on assumed models, making it versatile for various estimation challenges. In Chapter \ref{chap: 4}, the dissertation explores specialized applications for survey response variables with multiple categories, addressing the impact of biased or informative sampling on assumed models. It proposes methods for accommodating survey weights seamlessly within the modeling and estimation processes, conducting a comparative analysis with Multilevel Regression with Poststratification (MRP). The dissertation concludes by summarizing key findings and contributions from each chapter, emphasizing implications for evidence-based policymaking and outlining future research directions.Item GENERALIZED OBSERVED BEST PREDICTION WITH EMPIRICAL BAYES PARAMETRIC BOOTSTRAP MODEL BUILDING(2020) WALDRON, WILLIAM; Lahiri, Partha; Mathematics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The observed best predictor (OBP) has been recently offered as a more robust alternative to the remarkable empirical best linear unbiased predictor (EBLUP). Although the latter has become a pervasive tool among applied statisticians, there are critical reasons why the OBP should almost always be used in conjunction with the EBLUP. In particular, mathematical models are often oversimplified or misspecified, lacking key predictors within the available set of data. For more complex models such as time-series applications, model robustness becomes even more imperative. We will provide some results related to the OBP theory and introduce a generalized, or weighted version of the OBP for different loss functions. This will first be defined on the Fay-Herriot model and then extended to the General Linear Mixed model. Finally, we will apply the best predictive estimator (BPE) to both parameter coefficients and variance parameters within the Fay-Herriot and cross-sectional time series models. Model building strategies abound, and have continued to evolve. These are instrumental for applied statisticians and analysts passing judgement on whether statistical models are suitable for drawing conclusions or producing official estimates. A number of methodologies and approaches have been developed to consider this critical question of model selection and diagnostics. We endeavor to view this problem from the perspective of empirical Bayes (EB) - in a similar fashion as the EBLUP. As such, we define and develop an EB parametric bootstrap approach not only to estimate mean squared error, but also for finding the best model from a set of candidates (e.g., variable selection). This could be done for general criteria by considering leave-one-out predictive distributions. Once a viable model is selected, we can continue the model-building process by performing appropriate validation. Thus, the method is not only versatile, but has some computational advantages over other model building strategies.Item Statistical Inference Using Data From Multiple Files Combined Through Record Linkage(2018) HAN, YING; Lahiri, Partha; Mathematical Statistics; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Record linkage methods help us combine multiple data sets from different sources when a single data set with all necessary information is unavailable or when data collection on additional variables is time consuming and extremely costly. Linkage errors are inevitable in the linked data set because of the unavailability of an error-free and unique identifier and because of possible errors in measuring or recording. It has been realized that even a small amount of linkage errors can lead to substantial bias and increase variability in estimating the parameters of a statistical model. The importance of incorporating uncertainty of the record linkage process into the statistical analysis step cannot be overemphasized. The current research is mainly focused on the regression analysis of the linked data. The record linkage and statistical analysis processes are treated as two separate steps. Due to the limited information about the record linkage process, simplifying assumptions on the linkage mechanism have to be made. In reality, however, these assumptions may be violated. Also, most of the existing linkage error models are built on the linked data set, which only contains records for the designated links. Information about linkage errors carried by the designated non-links is missing. In the dissertation, we provide general methodologies for both regression analysis and small area estimation using data from multiple files. A general integrated model is proposed to combine the record linkage and statistical analysis processes. The proposed linkage error models are built directly on the data values from the original sources, and based on the actual record linkage method that is used. We have adapted the jackknife methods to estimate bias, variance, and mean squared error of our proposed estimators. To illustrate the general methodology, we give one example of estimating the regression coefficients in the linear and logistic regression models, and another example of estimating small area mean under the nested-error linear regression model. In order to reduce the computational burden, simplified version of the proposed estimators, jackknife methods, and numerical algorithms are given. A Monte Carlo simulation study is devised to evaluate the performance of the proposed estimators and to investigate the difference between the standard and simplified jackknife methods.Item Hierarchical Bayesian Estimation of Small Area Means Using Complex Survey Data(2013) Ha, Neung Soo; Lahiri, Partha; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In survey data analysis, there are two main approaches -design-based and modelbased-for making inferences for different characteristics of the population. A designbased approach tends to produce unreliable estimates for small geographical regions or cross classified demographic regions due to the small sample sizes. Moreover, when there are no samples available in those areas, a design-based method cannot be used. In the case of estimating population characteristics for a small area, model-based methods are used. They provide a flexible modeling method that can incorporate relevant information from similar areas and external databases. To provide suitable estimates, many model building techniques, both frequentist and Bayesian, have been developed, and when the model-based method makes an explicit use of prior distributions on the hyperparameters, inference can be carried out in the Bayesian paradigm. For estimating small area proportions, mixed models are often used because of the flexibility in combining information from different sources and of the tractability of error sources. Mixed models are categorized into two broad classes, area-level and unit-level models, and the use of either model depends on the availability of information. Generally, estimation of small area proportions with the hierarchical Bayes(HB) method involves transformation of the direct survey-weighted estimates that stabilizes the sampling variance. Additionally, it is commonly assumed that the survey-weighted proportion has a normal distribution with a known sampling variance. We find that these assumptions and application methods may introduce some complications. First, the transformation of direct estimates can introduce bias when they are back transformed for obtaining the original parameter of interest. Second, transformation of direct estimates can cause additional measures of uncertainty. Third, certain commonly used functions for transformation cannot be used, such as log transformation on a zero survey count. Fourth, applying fixed values for sampling variances may fail to capture the additional variability. Last, assumption of the normality of the model distribution might be inappropriate when the true parameter of interest lies near the extremities (near 0 or 1). To address these complications, we first expand the Fay-Herriot area-level model for estimating proportions that can directly model the survey-weighted proportions without using any transformation functions. Second, we introduce a logit function for the linking model, which is more appropriate for estimating proportions. Third, we model the sampling variance to capture the additional variability. Additionally, we develop a model that can be used for modeling the survey weighted counts directly. We also explore a new benchmarking approach for the estimates. Estimates are benchmarked when the aggregate of the estimates from the smaller regions matches that of the corresponding larger region. The benchmarking techniques involve a number of constraints. Our approach introduces a simple method that can be applied to complicated constraints when applying a traditional method may fail. Finally, we investigate the "triple-goal" estimation method that can concurrently achieve the three specific goals relatively well as an ensemble.Item Selected Problems in Multi-Sample Statistical Inference(2012) Franco, Carolina; Kagan, Abram M; Applied Mathematics and Scientific Computation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)In Chapter 1, a natural semiparametric model for case control study data is discussed, and the asymptotic properties of two simple methods of estimation are explored. The probability element of the model can be factored into a known positive function h involving the finite dimensional structural parameter, an infinite dimensional nuisance parameter in the form of the probability element dP of a distribution, and a normalizing constant. In the setup of interest, a sample of size n is available from a population with a distribution from the aforementioned model . A second, independent sample provides information only about the infinite-dimensional nuisance parameter P. The methods of estimation involve replacing the infinite-dimensional parameter with the empirical distribution function based on the second sample, and constructing semiparametric analogs of the maximum likelihood estimator and the method of moment estimator. The simplicity of these semiparametric estimators permits analysis of their asymptotic distribution even when n and m grow at different rates, yielding very natural and interpretable asymptotic results. In the case where n=o(m), the analog of the maximum likelihood estimator is asymptotically efficient. Chapter 2 explores a related parametric asymptotic statistics problem. Suppose a sample of size m is available from a population with density fY(y; lambda), and an independent sample of size n is available from a population with density fX(x;lambda,alpha). Here &lambda is regarded as a nuisance parameter and &alpha is the structural parameter, where &lambda and &alpha are scalars. One approach to estimation of &alpha would be to compute the maximum likelihood estimator based on both samples. A second approach would be to first find the maximum likelihood estimator of &lambda from the first sample, and to then treat it as the true parameter when using maximum likelihood estimation based on the second sample. Chapter 2 compares the asymptotic behavior of these two estimators under different assumptions about the rate of growth of m relative to n. In chapter 3 we consider interval estimation for small area proportions based on data collected under stratified random sampling. We focus on the case where the stratum sample sizes and the true proportions are small for all strata, and for simplicity we assume equal stratum sample sizes. The objective is to construct a confidence interval for each of the true stratum proportions, Pi. A commonly used small area empirical Bayes model for a single stratum's true proportion Pi assumes that the distributions of the sampled stratum proportions and the prior distribution of the true stratum proportions are normal. The well-documented problems of the normal approximation to the binomial, particularly when the sample size is small and the probability of success is close to 0 or 1, raise questions about the adequacy of such a model when the Pi and the stratum sample sizes are small. We argue that a more reasonable model in this setting is to assume that the sampled stratum counts have binomial distributions and that the prior distribution of the true stratum proportions follows a beta distribution. We propose a new empirical Bayes confidence interval based on this model, and examine related simulation results.