Factors Influencing The Mixture Index of Model Fit in Contingency Tables Showing Indenpendence
Dayton, C. Mitchell
MetadataShow full item record
Traditional methods for evaluating contingency table models based on chi square statistics or quantities derived from them are not attractive in many applied research settings. The two-point mixture index of fit, pi-star, introduced by Rudas, Clogg and Lindsay (RCL: 1994) provides a new way to represent goodness-of-fit for contingency tables. This study: (a) evaluated several techniques for dealing with sampling zeros when computing pi-star in contingency tables when the independence assumption holds; (b) investigated the performance of the estimate in various combinations of conditions, as a function of different sizes of tables, different marginal distributions and different sample sizes; and (c) compared the standard error of pi-star and confidence interval estimated by using a method proposed by RCL, with the "true" standard error based on empirical simulations in various scenarios especially when encountering small sample sizes and close to zero. The goals of this study were achieved by Monte Carlo simulation methods and then were applied to two real data examples. The first is a 6 by 3 cross-classification of fatal crashes by speed limit and land use with 37,295 cases based on 2004 USDOT traffic data and the second 4 by 4 cross-classification of eye color and hair color with 592 cases reported in RCL. Results suggest that: pi-star is positively biased from zero in a range from 2.98% to 40.86% in the conditions studied when the independence assumption holds. Replacing zero with larger flattening values results in smaller pi-star. For each table size, pi-star is smallest for all extremely dispersed row and column marginal distributions. For all extremely and most slightly dispersed marginal distributions tables with small sample size and small table size, using structural zero technique is superior to other sampling zero techniques. The lower bound for pi-star using the RCL method is generally close to the "true" estimate based on empirical parametric simulation. However, under some circumstances, RCL method underestimates the lower bound value even though the magnitude is relatively small and the difference shrinks as the sample size increases. This study will provide guidance for researchers in the use of this important method for interpreting models fit to contingency tables.