Effects of Model Selection on the Coverage Probability of Confidence Intervals in Binary-Response Logistic Regression

Zhang, Dongquan

Effects of Model Selection on the Coverage Probability of Confidence Intervals in Binary-Response Logistic Regression

dc.contributor.advisor	Dayton, C. Mitchell	en_US
dc.contributor.author	Zhang, Dongquan	en_US
dc.contributor.department	Measurement, Statistics and Evaluation	en_US
dc.contributor.publisher	Digital Repository at the University of Maryland	en_US
dc.contributor.publisher	University of Maryland (College Park, Md.)	en_US
dc.date.accessioned	2008-10-11T05:43:55Z
dc.date.available	2008-10-11T05:43:55Z
dc.date.issued	2008-07-24	en_US
dc.description.abstract	While model selection is viewed as a fundamental task in data analysis, it imposes considerable effects on the subsequent inference. In applied statistics, it is common to carry out a data-driven approach in model selection and draw inference conditional on the selected model, as if it is given a priori. Parameter estimates following this procedure, however, generally do not reflect uncertainty about the model structure. As far as confidence intervals are concerned, it is often misleading to report estimates based upon conventional 1−α without considering possible post-model-selection impact. This paper addresses the coverage probability of confidence intervals of logit coefficients in binary-response logistic regression. We conduct simulation studies to examine the performance of automatic model selectors AIC and BIC, and their subsequent effects on actual coverage probability of interval estimates. Important considerations (e.g. model structure, covariate correlation, etc.) that may have key influence are investigated. This study contributes in terms of understanding quantitatively how the post-model-selection confidence intervals perform in terms of coverage in binary-response logistic regression models. A major conclusion was that while it is usually below the nominal level, there is no simple predictable pattern with regard to how and how far the actual coverage probability of confidence intervals may fall. The coverage probability varies given the effects of multiple factors: (1) While the model structure always plays a role of paramount importance, the covariate correlation significantly affects the interval's coverage, with the tendency that a higher correlation indicates a lower coverage probability. (2) No evidence shows that AIC inevitably outperforms BIC in terms of achieving higher coverage probability, or vice versa. The model selector's performance is dependent upon the uncertain model structure and/or the unknown parameter vector θ . (3) While the effect of sample size is intriguing, a larger sample size does not necessarily achieve asymptotically more accurate inference on interval estimates. (4) Although the binary threshold of the logistic model may affect the coverage probability, such effect is less important. It is more likely to become substantial with an unrestricted model when extreme values along the dimensions of other factors (e.g. small sample size, high covariate correlation) are observed.	en_US
dc.format.extent	909814 bytes
dc.format.mimetype	application/pdf
dc.identifier.uri	http://hdl.handle.net/1903/8538
dc.language.iso	en_US
dc.subject.pqcontrolled	Education, Tests and Measurements	en_US
dc.subject.pqcontrolled	Statistics	en_US
dc.subject.pquncontrolled	Model Selection	en_US
dc.subject.pquncontrolled	Coverage Probability	en_US
dc.subject.pquncontrolled	Logistic Regression	en_US
dc.title	Effects of Model Selection on the Coverage Probability of Confidence Intervals in Binary-Response Logistic Regression	en_US
dc.type	Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: umi-umd-5618.pdf
Size:: 888.49 KB
Format:: Adobe Portable Document Format

Download

Collections

UMD Theses and Dissertations
Human Development & Quantitative Methodology Theses and Dissertations