Selection Bias in Nonprobability Surveys: A Causal Inference Approach
Mercer, Andrew William
MetadataShow full item record
Many in the survey research community have expressed concern at the growing popularity of nonprobability surveys. The absence of random selection prompts justified concerns about self-selection producing biased results and means that traditional, design-based estimation is inappropriate. The Total Survey Error (TSE) paradigm’s designations of selection bias as attributable to undercoverage or nonresponse are not especially helpful for nonprobability surveys as they are based on an implicit assumption that selection and inferences rely on randomization. This dissertation proposes an alternative classification for sources of selection bias for nonprobability surveys based on principles borrowed from the field of causal inference. The proposed typology describes selection bias in terms of the three conditions that are required for a statistical model to correct or explain systematic differences between a realized sample and the target population: exchangeability, positivity, and composition. We describe the parallels between causal and survey inference and explain how these three sources of bias operate in both probability and nonprobability survey samples. We then provide a critical review of current practices in nonprobability data collection and estimation viewed through the lens of the causal bias framework. Next, we show how net selection bias can be decomposed into separate additive components associated with exchangeability, positivity, and composition respectively. Using 10 parallel nonprobability surveys from different sources, we estimate these components for six measures of civic engagement using the 2013 Current Population Survey Civic Engagement Supplement as a reference sample. We find that a large majority of the bias can be attributed to a lack of exchangeability. Finally, using the same six measures of civic engagement, we compare the performance of four approaches to nonprobability estimation based on Bayesian additive regression trees. These are propensity weighting (PW), outcome regression (OR), and two types of doubly-robust estimators: outcome regression with a residual bias correction (OR-RBC) and outcome regression with a propensity score covariate (OR-PSC). We find that OR-RBC tends to have the lowest bias, variance, and RMSE, with PW only slightly worse on all three measures.