The Impact of Model Selection on Loglinear Analysis of Contingency Tables

Loading...
Thumbnail Image

Files

Publication or External Link

Date

2009

Citation

DRUM DOI

Abstract

It is common practice for researchers in the social sciences and education to use model selection techniques to search for best fitting models and to carry out inference as if these models were given a priori. This study examined the effect of model selection on inference in the framework of loglinear modeling. The purposes were to (i) examine the consequences when the behavior of model selection is ignored; and (ii) investigate the performance of the estimator provided by the Bayesian model averaging method and evaluate the usefulness of the multi-model inference as opposed to the single model inference.

The basic finding of this study was that inference based on a single "best fit" model chosen from a set of candidate models leads to underestimation of the sampling variability of the parameters estimates and induces additional bias in the estimates. The results of the simulation study showed that due to model uncertainty the post-model-selection parameter estimator has larger bias, standard error, and mean square error than the estimator under the true model assumption. The same results applied to the conditional odds ratio estimators. The primary reason for these results is that the sampling distribution of the post-model-selection estimator is, in actuality, a mixture of distributions from a set of candidate models. Thus, the variability of the post-model- selection estimator has a large component from selection bias. While these problems were alleviated with the increase of sample size, the interpretation of the p-value of the Z-statistic of the parameters was misleading even when sample size was quite large. To avoid the problem of inference based on a single best model, Bayesian model averaging adopts a multi-model inference method, treating the weighted mean of the estimates from each model in the set as a point estimator, where the weights are derived using Bayes' theorem.

Notes

Rights