REWEIGHTING DATA IN THE SPIRIT OF TUKEY: USING BAYESIAN POSTERIOR PROBABILITIES AS RASCH RESIDUALS FOR STUDYING MISFIT
Publication or External Link
A new variant of the iterative "data = fit + residual" data-analytical approach described by Mosteller and Tukey is proposed and implemented in the context of item response theory psychometric models. Posterior probabilities from a Bayesian mixture model of a Rasch item response theory model and an unscalable latent class are expressed as weights for the original data. The data weighted by the units' posterior probabilities for the unscalable class is used for further exploration of structures. Data were generated in accordance with departures from the Rasch model that have been studied in the literature. Factor analysis models are compared with the original data and the data as reweighted by the posterior probabilities for the unscalable class. Eigenvalues are compared with Horn's parallel analysis corresponding to each class of factor models to determine the number of factors in a dataset. In comparing two weighted data sets, the Rasch weighted data and the data were considered unscalable, and clear differences are manifest. Pattern types are detected for the Rasch baselines that have different patterns than that of random or systematic contamination. The Rasch baseline patterns are strongest around item difficulties that are closest to the mean generating value of è's. Patterns in baseline conditions are weaker as they depart from a item difficulty of zero and move toward extreme values of ±6. The random contamination factor patterns are typically flat and near zero regardless of the item difficulty with which it is associated. Systematic contamination using reversed Rasch generated data produces alternate patterns to the Rasch baseline condition and in some conditions shows an opposite effect when compared to the Rasch patterns. Differences can also be detected within the residually weighted data between the Rasch generated subtest and contaminated subtest. In conditions that have identified factors, the Rasch subtest often had Rasch patterns and the contaminated subtest has some form of random/flat or systematic/reversed pattern.