Weight Adjustment Methods and Their Impact on Sample-based Inference
Henry, Kimberly Anne
Valliant, Richard V
MetadataShow full item record
Weighting samples is important to reflect not only sample design decisions made at the planning stage, but also practical issues that arise during data collection and cleaning that necessitate weighting adjustments. Adjustments to base weights are used to account for these planned and unplanned eventualities. Often these adjustments lead to variations in the survey weights from the original selection weights (i.e., the weights based solely on the sample units' probabilities of selection). Large variation in survey weights can cause inferential problems for data users. A few extremely large weights in a sample dataset can produce unreasonably large estimates of national- and domain-level estimates and their variances in particular samples, even when the estimators are unbiased over many samples. Design-based and model-based methods have been developed to adjust such extreme weights; both approaches aim to trim weights such that the overall mean square error (MSE) is lowered by decreasing the variance more than increasing the square of the bias. Design-based methods tend to be ad hoc, while Bayesian model-based methods account for population structure but can be computationally demanding. I present three research papers that expand the current weight trimming approaches under the goal of developing a broader framework that connects gaps and improves the existing alternatives. The first paper proposes more in-depth investigations of and extensions to a newly developed method called generalized design-based inference, where we condition on the realized sample and model the survey weight as a function of the response variables. This method has potential for reducing the MSE of a finite population total estimator in certain circumstances. However, there may be instances where the approach is inappropriate, so this paper includes an in-depth examination of the related theory. The second paper incorporates Bayesian prior assumptions into model-assisted penalized estimators to produce a more efficient yet robust calibration-type estimator. I also evaluate existing variance estimators for the proposed estimator. Comparisons to other estimators that are in the literature are also included. In the third paper, I develop summary- and unit-level diagnostic tools that measure the impact of variation of weights and of extreme individual weights on survey-based inference. I propose design effects to summarize the impact of variable weights produced under calibration weighting adjustments under single-stage and cluster sampling. A new diagnostic for identifying influential, individual points is also introduced in the third paper.