Joint Program in Survey Methodology
Permanent URI for this communityhttp://hdl.handle.net/1903/2251
Browse
3 results
Search Results
Item Optimizing stratified sampling allocations to account for heteroscedasticity and nonresponse(2023) Mendelson, Jonathan; Elliott, Michael R; Lahiri, Partha; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Neyman's seminal paper in 1934 and subsequent developments of the next two decades transformed the practice of survey sampling and continue to provide the underpinnings of today's probability samples, including at the design stage. Although hugely useful, the assumptions underlying classic theory on optimal allocation, such as complete response and exact knowledge of strata variances, are not always met, nor is the design-based approach the only way to identify good sample allocations. This thesis develops new ways to allocate samples for stratified random sampling (STSRS) designs. In Papers 1 and 2, I provide a Bayesian approach for optimal STSRS allocation for estimating the finite population mean via a univariate regression model with heteroscedastic errors. I use Bayesian decision theory on optimal experimental design, which accommodates uncertainty in design parameters. By allowing for heteroscedasticity, I aim for improved realism in some establishment contexts, compared with some earlier Bayesian sample design work. Paper 1 assumes that the level of heteroscedasticity is known, which facilitates analytical results. Paper 2 relaxes this assumption, which results in an analytically intractable problem. Thus, I develop a computational approach that uses Monte Carlo sampling to estimate the loss for a given allocation, in conjunction with a stochastic optimization algorithm that accommodates noisy loss functions. In simulation, the proposed approaches performed as well or better than the design-based and model-assisted strategies considered, while having clearer theoretical justification. Paper 3 changes focus toward addressing how to account for nonresponse when designing samples. Existing theory on optimal STSRS allocation generally assumes complete response. A common practice is to allocate sample under complete response, then to inflate the sample sizes by the inverse of the anticipated response rates. I show that this practice overcorrects for nonresponse, leading to excessive costs per effective interview. I extend the existing design-based framework for STSRS allocation to accommodate scenarios with incomplete response. I provide theoretical comparisons between my allocation and common alternatives, which illustrate how response rates, population characteristics, and cost structure can affect the methods' relative efficiency. In an application to a self-administered survey of military personnel, the proposed allocation resulted in a 25% increase in effective sample size compared with common alternatives.Item BAYESIAN METHODS FOR PREDICTION OF SURVEY DATA COLLECTION PARAMETERS IN ADAPTIVE AND RESPONSIVE DESIGNS(2020) Coffey, Stephanie Michelle; Elliott, Michael R; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Adaptive and responsive survey designs rely on estimates of survey data collection parameters (SDCPs), such as response propensity, to make intervention decisions during data collection. These interventions are made with some data collection goal in mind, such as maximizing data quality for a fixed cost or minimizing costs for a fixed measure of data quality. Data quality may be defined by response rate, sample representativeness, or error in survey estimates. Therefore, the predictions of SDCPs are extremely important. Predictions within a data collection period are most commonly generated using fixed information about sample cases, and accumulating paradata and survey response data. Interventions occur during the data collection period, however, meaning they are applied based on predictions from incomplete accumulating data. There is evidence that the incomplete accumulating data can lead to biased and unstable predictions, particularly early in data collection. This dissertation explores the use of Bayesian methods to improve predictions of SDCPs during data collection, by providing a mathematical framework for combining priors, based on external data about covariates in the prediction models, with the current accumulating data to generate posterior predictions of SDCPs for use in intervention decisions.This dissertation includes three self-contained papers, each focused on the use of Bayesian methods to improve predictions of SDCPs for use in adaptive and responsive survey designs. The first paper predicts time to first contact, where priors are generated from historical survey data. The second paper implements expert elicitation, a method for prior construction when historical data is not available. The last paper describes a data collection experiment conducted using a Bayesian framework, which attempts to minimize data collection costs without reducing the quality of a key survey estimate. In all three papers, the use of Bayesian methods introduces modest improvements in the predictions of SDCPs, especially early in data collection, when interventions would have the largest effect on survey outcomes. Additionally, the experiment in the last paper resulted in significant data collection cost savings without having a significant effect on a key survey estimate. This work suggests that Bayesian methods can improve predictions of SDCPs that are critical for adaptive and responsive data collection interventions.Item Improving External Validity of Epidemiologic Analyses by Incorporating Data from Population-Based Surveys(2020) Wang, Lingxiao; Li, Yan; Survey Methodology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Many epidemiologic studies forgo probability sampling and turn to volunteer-based samples because of cost, confidentiality, response burden, and invasiveness of biological samples. However, the volunteers may not represent the underlying target population mainly due to self-selection bias. Therefore, standard epidemiologic analyses may not be generalizable to the target population, which is called lack of “external validity.” In survey research, propensity score (PS)-based approaches have been developed to improve representativeness of the nonprobability samples by using population-based surveys as references. These approaches create a set of “pseudo-weights” to weight the nonprobability sample up to the target population. There are two main types of PS-based approaches: (1) PS-based weighting methods using PSs to estimate participation rates of the nonprobability sample; for example, the inverse of PS weighting (IPSW); (2) PS-based matching methods using PSs to measure similarity between the units in the nonprobability sample and the reference survey sample, such as PS adjustment by subclassification (PSAS). Although the PS-based weighting methods reduce the bias, they are sensitive to propensity model misspecification and can be inefficient. The PS-based matching methods are more robust to the propensity model misspecification and can avoid extreme weights. However, matching methods such as PSAS are less effective at bias reduction. This dissertation proposes a novel PS-based matching method, named the kernel weighting (KW) approach, to improve the external validity of epidemiologic analyses that gain a better bias–variance tradeoff. A unifying framework is established for PS-based methods to provide three advances. First, the KW method is proved to provide consistent estimates, yet generally has smaller mean-square error than the IPSW. Second, the framework reveals a fundamental strong exchangeability assumption (SEA) underlying the existing PS-based matching methods that has previously been unknown. The SEA is relaxed to a weak exchangeability assumption that is more realistic for data analysis. Third, survey weights are scaled in propensity estimation to reduce the variance of the estimated PS and improve efficiency of all PS-based methods under the framework. The performance of the proposed PS-based methods is evaluated for estimating prevalence of diseases and associations between risk factors and disease in the finite population.