Human Development & Quantitative Methodology
Permanent URI for this communityhttp://hdl.handle.net/1903/2248
The departments within the College of Education were reorganized and renamed as of July 1, 2011. This department incorporates the former departments of Measurement, Statistics & Evaluation; Human Development; and the Institute for Child Study.
Browse
4 results
Search Results
Item Detecting Local Item Dependence in Polytomous Adaptive Data(2011) Mislevy, Jessica Lynn; Harring, Jeffrey R.; Rupp, Andre A.; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)A rapidly expanding arena for item response theory (IRT) is in attitudinal and health-outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of local item dependence (LID) have been studied both for polytomous items in fixed-form settings and for dichotomous items in CAT settings, there have been no publications applying LID detection methodology to polytomous items in CAT despite its central importance to these applications. The research documented herein investigates the extension of widely used methods of LID detection, Yen's Q3 statistic and Pearson's Statistic X2, in this context, via a simulation study. The simulation design and results are contextualized throughout with a real item bank and data set of this type from the Patient-Reported Outcomes Measurement Information System (PROMIS).Item Exploring the Full-information Bifactor Model in Vertical Scaling with Construct Shift(2011) Li, Ying; Lissitz, Robert W.; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)To address the lack of attention to construct shift in IRT vertical scaling, a bifactor model is proposed to estimate the common dimension for all grades and the grade-specific dimensions. The bifactor model estimation accuracy is evaluated through a simulation study with manipulated factors of percent of common items, sample size, and degree of construct shift. In addition, the unidimensional IRT (UIRT) estimation model that ignores construct shift is examined to represent the current practice for IRT vertical scaling; comparisons on parameter estimation accuracy of the bifactor and UIRT models are discussed. The major findings of the simulation study are (1) bifactor models are well recovered overall, even though item discrimination parameters are underestimated to a small degree; (2) item discrimination parameter estimates are overestimated in UIRT models due to the effect of construct shift; (3) person parameters of UIRT models are less accurately estimated than that of bifactor models, and the accuracy decreases as the degree of construct shift increases; (4) group mean parameter estimates of UIRT models are less accurate than that of bifactor models, and a large effect due to construct shift is found for the group mean parameter estimates of UIRT models. The real data analysis provides an illustration of how bifactor models can be applied to a problem involving for vertical scaling with construct shift. General procedures for testing practice are also discussed.Item IMPACTS OF LOCAL ITEM DEPENDENCE OF TESTLET ITEMS WITH THE MULTISTAGE TESTS FOR PASS-FAIL DECISIONS(2010) Lu, Ru; Jiao, Hong; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The primary purpose of this study is to investigate the impact of the local item dependence (LID) of testlet items on the performance of the multistage tests (MST) that make pass/fail decisions. In this study, LID is simulated in testlet items. Testlet items are those that physically share the same stimulus. In the MST design, the proportion of testlet items is a manipulated factor. Other studied factors include testlet item position, LID magnitude, and test length. The second purpose of this study is to use a testlet response model to account for LID in the context of MSTs. The possible gains of using a testlet model against a standard IRT model are evaluated. The results indicate that under the simulated conditions, the testlet item position has a very minimal effect on the precision of ability estimation and decision accuracy, while the item pool structure (the proportion of testlet items), the LID magnitude and test length have fairly substantial effects. Ignoring the LID effects and fitting a unidimensional 3PL model result in the loss of ability estimation precision and decision accuracy. The ability estimation is adversely impacted by larger proportion of testlet items, the moderate and high LID levels and short test lengths. As the LID condition gets worse (large LID magnitude, or large proportion of testlet items), the decision accuracy rates decrease. Fitting a 3PL testlet response model does not reach the same level of ability estimation precision under all simulations conditions. In fact, it proves that ignoring LID and fitting the 3PL model provides inflated ability estimation precision and the accuracy of decision accuracies.Item REWEIGHTING DATA IN THE SPIRIT OF TUKEY: USING BAYESIAN POSTERIOR PROBABILITIES AS RASCH RESIDUALS FOR STUDYING MISFIT(2010) Dardick, William Ross; Mislevy, Robert J; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)A new variant of the iterative "data = fit + residual" data-analytical approach described by Mosteller and Tukey is proposed and implemented in the context of item response theory psychometric models. Posterior probabilities from a Bayesian mixture model of a Rasch item response theory model and an unscalable latent class are expressed as weights for the original data. The data weighted by the units' posterior probabilities for the unscalable class is used for further exploration of structures. Data were generated in accordance with departures from the Rasch model that have been studied in the literature. Factor analysis models are compared with the original data and the data as reweighted by the posterior probabilities for the unscalable class. Eigenvalues are compared with Horn's parallel analysis corresponding to each class of factor models to determine the number of factors in a dataset. In comparing two weighted data sets, the Rasch weighted data and the data were considered unscalable, and clear differences are manifest. Pattern types are detected for the Rasch baselines that have different patterns than that of random or systematic contamination. The Rasch baseline patterns are strongest around item difficulties that are closest to the mean generating value of è's. Patterns in baseline conditions are weaker as they depart from a item difficulty of zero and move toward extreme values of ±6. The random contamination factor patterns are typically flat and near zero regardless of the item difficulty with which it is associated. Systematic contamination using reversed Rasch generated data produces alternate patterns to the Rasch baseline condition and in some conditions shows an opposite effect when compared to the Rasch patterns. Differences can also be detected within the residually weighted data between the Rasch generated subtest and contaminated subtest. In conditions that have identified factors, the Rasch subtest often had Rasch patterns and the contaminated subtest has some form of random/flat or systematic/reversed pattern.