Regression Diagnostics for Complex Survey Data: Identification of Influential Observations

Li, Jianzhu

Regression Diagnostics for Complex Survey Data: Identification of Influential Observations

dc.contributor.advisor	Valliant, Richard	en_US
dc.contributor.author	Li, Jianzhu	en_US
dc.contributor.department	Survey Methodology	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2008-04-22T16:01:25Z
dc.date.available	2008-04-22T16:01:25Z
dc.date.issued	2007-09-13	en_US
dc.description.abstract	Discussion of diagnostics for linear regression models have become indispensable chapters or sections in most of the statistical textbooks. However, survey literature has not given much attention to this problem. Examples from real surveys show that sometimes the inclusion and exclusion of a small number of the sampled units can greatly change the regression parameter estimates, which indicates that techniques of identifying the influential units are necessary. The goal of this research is to extend and adapt the conventional ordinary least squares influence diagnostics to complex survey data, and determine how they should be justified. We assume that an analyst is looking for a linear regression model that fits reasonably well for the bulk of the finite population and chooses to use the survey weighted regression estimator. Diagnostic statistics such as DFBETAS, DFFITS, and modified Cook's Distance are constructed to evaluate the effect on the regression coefficients of deleting a single observation. As components of the diagnostic statistics, the estimated variances of the coefficients are obtained from design-consistent estimators which account for complex design features, e.g. clustering and stratification. For survey data, sample weights, which are computed with the primary goal of estimating finite population statistics, are sources of influence besides the response variable and the predictor variables, and therefore need to be incorporated into influence measurement. The forward search method is also adapted to identify influential observations as a group when there is possible masked effect among the outlying observations. Two case studies and simulations are done in this dissertation to test the performance of the adapted diagnostic statistics. We reach the conclusion that removing the identified influential observations from the model fitting can obtain less biased estimated coefficients. The standard errors of the coefficients may be underestimated since the variation in the number of observations used in the regressions was not accounted for.	en_US
dc.format.extent	3045157 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/1903/7598
dc.language.iso	en_US
dc.subject.pqcontrolled	Statistics	en_US
dc.subject.pqcontrolled	Social Work	en_US
dc.title	Regression Diagnostics for Complex Survey Data: Identification of Influential Observations	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: umi-umd-4863.pdf
Size:: 2.9 MB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Joint Program in Survey Methodology Theses and Dissertations