INVESTIGATING ITEM PARAMETER DRIFT (IPD) AMPLIFICATION AND CANCELLATION AT THE TESTLET-LEVEL ON MODEL PARAMETER ESTIMATION IN A MULTISTAGE TEST
Publication or External Link
A test fairness goal is to design testing systems where performance is measured at acceptable degrees of accuracy across a wide range of test taker ability levels and across subgroups of student population.
Research has shown that with adaptive test, a high degree of test score accuracy is realized. Furthermore, standard item response theory (IRT) models, are the predominately used measurement models in educational testing for computerized multistage adaptive tests (MST). Moreover, item sets, or testlets, are widely used items types in MSTs.
Fitting standard IRT models to response data comes with item invariance and local independence assumptions. In practice, unexpected shifts in parameter values, or item parameter drift (IPD), across test administrations have been reported. Moreover, testlet items have been known to exhibit local item dependence (LID) due to interactions between the test taker and the common testlet stimulus. When IPD and/or LID are exhibited, these are likely violations of standard IRT assumptions threatening ability estimate accuracy and test score validity. Furthermore, a conjecture in this study is that the accumulation of insignificant IPD may be significant at the testlet level due to amplification or become insignificant at the testlet level due to cancellation. To date, no studies have investigated the combined impact of IPD amplification or cancellation at the testlet level with LID on ability estimation accuracy in an MST system.
In this study, MST ability estimates generated under the two-parameter logistic (2PL) IRT and 2PL testlet response theory (TRT) models are compared to determine if there are significant differences when the amplification and cancellation of IPD to the testlet-level and LID are exhibited and when LID is not exhibited. Further, this study examines the combined impact of amplification and cancellation of IPD to the testlet-level and/or LID on MST system routing performance.
This study reveals that ability estimation, routing, and decision accuracy are not significantly impacted by combined amplification, cancellation, and/or LID effects. However routing accuracy is impacted by module difference, routing error stage, or testlet effects. Finally, moderate ability test takers are found to be more likely misclassified than low or high ability test takers.