Exploring Unidimensional Proficiency Classification Accuracy From Multidimensional Data in a Vertical Scaling Context

Thumbnail Image


Publication or External Link






When Item Response Theory (IRT) is operationally applied for large scale assessments, unidimensionality is typically assumed. This assumption requires that the test measures a single latent trait. Furthermore, when tests are vertically scaled using IRT, the assumption of unidimensionality would require that the battery of tests across grades measures the same trait, just at different levels of difficulty. Many researchers have shown that this assumption may not hold for certain test batteries and, therefore, the results from applying a unidimensional model to multidimensional data may be called into question. This research investigated the impact on classification accuracy when multidimensional vertical scaling data are estimated with a unidimensional model. The multidimensional compensatory two-parameter logistic model (MC2PL) was the data-generating model for two levels of a test administered to simulees of correspondingly different abilities. Simulated data from the MC2PL model was estimated according to a unidimensional two-parameter logistic (2PL) model and classification decisions were made from a simulated bookmark standard setting procedure based on the unidimensional estimation results. Those unidimensional classification decisions were compared to the "true" unidimensional classification (proficient or not proficient) of simulees in multidimensional space obtained by projecting a simulee's generating two-dimensional theta vector onto a unidimensional scale via a number correct transformation on the entire test battery (i.e. across both grades). Specifically, conditional classification accuracy measures were considered. That is, the proportion of truly not proficient simulees classified correctly and the proportion of truly proficient simulees classified correctly were the criterion variables. Manipulated factors in this simulation study included the confound of item difficulty with dimensionality, the difference in mean abilities on both dimensions of the simulees taking each test in the battery, the choice of common items used to link the exams, and the correlation of the two abilities. Results suggested that the correlation of the two abilities and the confound of item difficulty with dimensionality both had an effect on the conditional classification accuracy measures. There was little or no evidence that the choice of common items or the differences in mean abilities of the simulees taking each test had an effect.