Theses and Dissertations from UMD

Permanent URI for this communityhttp://hdl.handle.net/1903/2

New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM

More information is available at Theses and Dissertations at University of Maryland Libraries.

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    Item
    THE IMPACT OF PRELIMINARY MODEL SELECTION ON LATENT GROWTH MODEL PARAMETER ESTIMATES
    (2010) Wang, Hsiu-Fei; Hancock, Gregory R; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In latent growth modeling (LGM), model selection and inference are treated as separate stages of data analysis, but they are generally conducted on the assumption that the model is known a priori and thus model selection and inference are performed on the same data set. This two-step process ignores the effects of model uncertainty on parameter estimation and statistical inference, and thus may result in problems which ultimately lead to misleading or invalid inferences. This present study was thus designed to investigate the possible problems arising from the use of the two-step process in LGM. The goals of this study were: (1) To examine the subsequent impact of preliminary model selection using information criteria on LGM parameter estimates; (2) To assess the data splitting method as a possible way to mitigate the effects of model uncertainty. To achieve these goals, two Monte Carlo simulation studies were conducted. Study1 conducted both model selection and parameter estimation using the same data set, to investigate the possible impact of preliminary model selection in terms of relative parameter biases, and coverage rate. Study 2 conducted model selection and parameter estimation using different split-data sets, in order to assess the data splitting method as a possible way to mitigate the effects of model uncertainty. The major finding of this study was that inference based on the AIC or the BIC model selection leads to additional bias in the estimates and overestimates the sampling variability of the parameters estimates. The results of the simulation study showed that the post-model-selection parameter estimator has larger relative parameter biases, larger relative variance biases, and smaller coverage rate of confidence interval than those of the pre-model-selection estimator. These post-model-selection problems due to model uncertainty, unfortunately, still existed when the data splitting method was applied.
  • Thumbnail Image
    Item
    Effects of Model Selection on the Coverage Probability of Confidence Intervals in Binary-Response Logistic Regression
    (2008-07-24) Zhang, Dongquan; Dayton, C. Mitchell; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    While model selection is viewed as a fundamental task in data analysis, it imposes considerable effects on the subsequent inference. In applied statistics, it is common to carry out a data-driven approach in model selection and draw inference conditional on the selected model, as if it is given a priori. Parameter estimates following this procedure, however, generally do not reflect uncertainty about the model structure. As far as confidence intervals are concerned, it is often misleading to report estimates based upon conventional 1−α without considering possible post-model-selection impact. This paper addresses the coverage probability of confidence intervals of logit coefficients in binary-response logistic regression. We conduct simulation studies to examine the performance of automatic model selectors AIC and BIC, and their subsequent effects on actual coverage probability of interval estimates. Important considerations (e.g. model structure, covariate correlation, etc.) that may have key influence are investigated. This study contributes in terms of understanding quantitatively how the post-model-selection confidence intervals perform in terms of coverage in binary-response logistic regression models. A major conclusion was that while it is usually below the nominal level, there is no simple predictable pattern with regard to how and how far the actual coverage probability of confidence intervals may fall. The coverage probability varies given the effects of multiple factors: (1) While the model structure always plays a role of paramount importance, the covariate correlation significantly affects the interval's coverage, with the tendency that a higher correlation indicates a lower coverage probability. (2) No evidence shows that AIC inevitably outperforms BIC in terms of achieving higher coverage probability, or vice versa. The model selector's performance is dependent upon the uncertain model structure and/or the unknown parameter vector θ . (3) While the effect of sample size is intriguing, a larger sample size does not necessarily achieve asymptotically more accurate inference on interval estimates. (4) Although the binary threshold of the logistic model may affect the coverage probability, such effect is less important. It is more likely to become substantial with an unrestricted model when extreme values along the dimensions of other factors (e.g. small sample size, high covariate correlation) are observed.