Thumbnail Image


Publication or External Link






Reporting subscores is a prevalent practice in standardized tests to provide diagnostic information for learning and instruction. Previous research has developed various methods for reporting subscores (e.g. de la Torre & Patz, 2005; Wainer et al., 2001; Wang, Chen, & Cheng, 2004; Yao & Boughton, 2007; Yen, 1987). However, the existing methods are not suitable for reporting subscores for a test with innovative item types, such as double-coded items and paired stimuli. This study proposes a two-parameter doubly testlet model with internal restrictions on the item difficulties (2PL-DT-MIRID) to report subscores for a test with double-coded items embedded in paired-testlets. The proposed model is based on a doubly-testlet model proposed by Jiao and Lissitz (2014) and the MIRID (Butter, De Boeck, & Verhelst, 1998). The proposed model has four major advantages in reporting subscores— (a) it reports subscores for a test with double-coded items in complex scenario structures, (b) it reports subscores designed for content clustering, which is more common than subscores based on construct dimensionality in standardized tests, (c) it is computationally less challenging than the Multidimensional Item Response Theory (MIRT) models when estimating subscores, (d) it can be used to conduct Item Response Theory (IRT) based number-correct scoring (NCS, Yen, 1984a).

A simulation study is conducted to evaluate the model parameter recovery, subscore estimation and subscore reliability. The simulation study manipulates three factors: (a) the magnitude of testlet effect variation, (b) the correlation between testlet effects for the dual testlets and (c) the percentage of double-coded items in the test. Further, the study compares the proposed model with other underspecified models in terms of model parameter estimation and model fit.

The result of the simulation study has shown that the proposed 2PL-DT-MIRID yields more accurate model parameter and subscore estimates, in general, when the testlet effect variation is small, the dual testlets are weakly correlated and there are more double-coded items in a test. Across the study conditions, the proposed model outperforms other competing models in model parameter estimation. The reliability yielded from models ignoring dual testlets are spuriously inflated, the 2PL-DTMIRID produces higher overall score reliability and subscore reliability than models ignoring double-coded items, in most study conditions. In terms of model fit, none of the model fit indices investigated in this study (i.e. AIC, BIC and DIC) can achieve satisfactory rates of identifying the proposed true model as the best fitting model.