Hierarchical Bayesian Estimation of Small Area Means Using Complex Survey Data
Ha, Neung Soo
MetadataShow full item record
In survey data analysis, there are two main approaches -design-based and modelbased-for making inferences for different characteristics of the population. A designbased approach tends to produce unreliable estimates for small geographical regions or cross classified demographic regions due to the small sample sizes. Moreover, when there are no samples available in those areas, a design-based method cannot be used. In the case of estimating population characteristics for a small area, model-based methods are used. They provide a flexible modeling method that can incorporate relevant information from similar areas and external databases. To provide suitable estimates, many model building techniques, both frequentist and Bayesian, have been developed, and when the model-based method makes an explicit use of prior distributions on the hyperparameters, inference can be carried out in the Bayesian paradigm. For estimating small area proportions, mixed models are often used because of the flexibility in combining information from different sources and of the tractability of error sources. Mixed models are categorized into two broad classes, area-level and unit-level models, and the use of either model depends on the availability of information. Generally, estimation of small area proportions with the hierarchical Bayes(HB) method involves transformation of the direct survey-weighted estimates that stabilizes the sampling variance. Additionally, it is commonly assumed that the survey-weighted proportion has a normal distribution with a known sampling variance. We find that these assumptions and application methods may introduce some complications. First, the transformation of direct estimates can introduce bias when they are back transformed for obtaining the original parameter of interest. Second, transformation of direct estimates can cause additional measures of uncertainty. Third, certain commonly used functions for transformation cannot be used, such as log transformation on a zero survey count. Fourth, applying fixed values for sampling variances may fail to capture the additional variability. Last, assumption of the normality of the model distribution might be inappropriate when the true parameter of interest lies near the extremities (near 0 or 1). To address these complications, we first expand the Fay-Herriot area-level model for estimating proportions that can directly model the survey-weighted proportions without using any transformation functions. Second, we introduce a logit function for the linking model, which is more appropriate for estimating proportions. Third, we model the sampling variance to capture the additional variability. Additionally, we develop a model that can be used for modeling the survey weighted counts directly. We also explore a new benchmarking approach for the estimates. Estimates are benchmarked when the aggregate of the estimates from the smaller regions matches that of the corresponding larger region. The benchmarking techniques involve a number of constraints. Our approach introduces a simple method that can be applied to complicated constraints when applying a traditional method may fail. Finally, we investigate the "triple-goal" estimation method that can concurrently achieve the three specific goals relatively well as an ensemble.