Effects of Model Selection on the Coverage Probability of Confidence Intervals in Binary-Response Logistic Regression

dc.contributor.advisorDayton, C. Mitchellen_US
dc.contributor.authorZhang, Dongquanen_US
dc.contributor.departmentMeasurement, Statistics and Evaluationen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2008-10-11T05:43:55Z
dc.date.available2008-10-11T05:43:55Z
dc.date.issued2008-07-24en_US
dc.description.abstractWhile model selection is viewed as a fundamental task in data analysis, it imposes considerable effects on the subsequent inference. In applied statistics, it is common to carry out a data-driven approach in model selection and draw inference conditional on the selected model, as if it is given a priori. Parameter estimates following this procedure, however, generally do not reflect uncertainty about the model structure. As far as confidence intervals are concerned, it is often misleading to report estimates based upon conventional 1−α without considering possible post-model-selection impact. This paper addresses the coverage probability of confidence intervals of logit coefficients in binary-response logistic regression. We conduct simulation studies to examine the performance of automatic model selectors AIC and BIC, and their subsequent effects on actual coverage probability of interval estimates. Important considerations (e.g. model structure, covariate correlation, etc.) that may have key influence are investigated. This study contributes in terms of understanding quantitatively how the post-model-selection confidence intervals perform in terms of coverage in binary-response logistic regression models. A major conclusion was that while it is usually below the nominal level, there is no simple predictable pattern with regard to how and how far the actual coverage probability of confidence intervals may fall. The coverage probability varies given the effects of multiple factors: (1) While the model structure always plays a role of paramount importance, the covariate correlation significantly affects the interval's coverage, with the tendency that a higher correlation indicates a lower coverage probability. (2) No evidence shows that AIC inevitably outperforms BIC in terms of achieving higher coverage probability, or vice versa. The model selector's performance is dependent upon the uncertain model structure and/or the unknown parameter vector θ . (3) While the effect of sample size is intriguing, a larger sample size does not necessarily achieve asymptotically more accurate inference on interval estimates. (4) Although the binary threshold of the logistic model may affect the coverage probability, such effect is less important. It is more likely to become substantial with an unrestricted model when extreme values along the dimensions of other factors (e.g. small sample size, high covariate correlation) are observed.en_US
dc.format.extent909814 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/1903/8538
dc.language.isoen_US
dc.subject.pqcontrolledEducation, Tests and Measurementsen_US
dc.subject.pqcontrolledStatisticsen_US
dc.subject.pquncontrolledModel Selectionen_US
dc.subject.pquncontrolledCoverage Probabilityen_US
dc.subject.pquncontrolledLogistic Regressionen_US
dc.titleEffects of Model Selection on the Coverage Probability of Confidence Intervals in Binary-Response Logistic Regressionen_US
dc.typeDissertationen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
umi-umd-5618.pdf
Size:
888.49 KB
Format:
Adobe Portable Document Format