Collinearity Diagnostics for Complex Survey Data
Survey data are often used to fit models. The values of covariates used in modeling are not controlled as they might be in an experiment. Thus, collinearity among the covariates is an inevitable problem in the analysis of survey data. Although many books and articles have described the collinearity problem and proposed strategies to understand, assess and handle its presence, the survey literature has not provided appropriate diagnostic tools to evaluate its impact on the regression estimation when the survey complexities are considered. The goal of this research is to extend and adapt the conventional ordinary least squares collinearity diagnostics to complex survey data when a linear model or generalized linear model is used. In this dissertation we have developed methods that generally have either a model-based or design-based interpretation. We assume that an analyst uses survey-weighted regression estimators to estimate both underlying model parameters (assuming a correctly specified model) and census-fit parameters in the finite population. Diagnostics statistics, variance inflation factors (VIFs), condition indexes and variance decomposition proportions are constructed to evaluate the impact of collinearity and determine which variables are involved. Survey weights are components of the diagnostic statistics and the estimated variances of the coefficients are obtained from design-consistent estimators which account for complex design features, e.g. clustering and stratification. Illustrations of these methods are given using data from a survey of mental health organizations and a household survey of health and nutrition. We demonstrate that specialized collinearity diagnostic statistics are needed to account for survey weights and complex finite population features that are reflected in the sample design and considered in the regression analysis.