Human Development & Quantitative Methodology

Permanent URI for this communityhttp://hdl.handle.net/1903/2248

The departments within the College of Education were reorganized and renamed as of July 1, 2011. This department incorporates the former departments of Measurement, Statistics & Evaluation; Human Development; and the Institute for Child Study.

Browse

Search Results

Now showing 1 - 10 of 20

A MIXTURE RASCH MODEL WITH A COVARIATE:A SIMULATION STUDY VIA BAYESIAN MARKOV CHAIN MONTE CARLO ESTIMATION
(2009) Dai, Yunyun; Mislevy, Robert J; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Mixtures of item response theory models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying structures. In particular, the impact of auxiliary variables, or covariates, for examinees in estimation has not been systematically explored. The goal of this dissertation is to carry out a systematically designed simulation study to investigate the performance of mixture Rasch model (MRM) under Bayesian estimation using Markov Chain Monte Carlo (MCMC) method. The dependent variables in this study are (1) the proportion of cases in which the generating mixture structure is recovered, and (2) among those cases in which the structure is recovered, the bias and root mean squared error of parameter estimates. The foci of the study are to use a flexible logistic regression model to parameterize the relation between latent class membership and the examinee covariate, to study MCMC estimation behavior in light of effect size, and to provide insights and suggestions on model application and model estimation.
AN INFORMATION CORRECTION METHOD FOR TESTLET-BASED TEST ANALYSIS: FROM THE PERSPECTIVES OF ITEM RESPONSE THEORY AND GENERALIZABILITY THEORY
(2009) Li, Feifei; MIslevy, Robert J.; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
An information correction method for testlet-based tests is introduced in this dissertation. This method takes advantage of both generalizability theory (GT) and item response theory (IRT). The measurement error for the examinee proficiency parameter is often underestimated when a unidimensional conditional-independence IRT model is specified for a testlet dataset. By using a design effect ratio composed of random variances which can be easily derived from GT analysis, it becomes possible to adjust the underestimated measurement error from the unidimensional IRT models to a more appropriate level. It is demonstrated how the information correction method can be implemented in the context of a testlet design. Through the simulation study, it is shown that the underestimated measurement errors of proficiency parameters from IRT calibration could be adjusted to the appropriate level despite the varying magnitude of local item dependence (LID), testlet length, balance of testlet length and number of the item parameters in the model. Each of the three factors (i.e., LID, testlet length and balance of testlet length) and their interactions have statistically significant effects on error adjustment. The real data example provides more details about when and how the information correction should be used in a test analysis. Results are evaluated by comparing the measurement errors from the IRT model with those from the testlet response theory (TRT) model. Given the robustness of the variance ratio, estimation of the information correction should be adequate for practical work.
An Integrated Item Response Model for Evaluating Individual Students' Growth in Educational Achievement
(2009) Koran, Jennifer; Hancock, Gregory R.; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Measuring continuous change or growth in individual students' academic abilities over time currently uses several statistical models or transformations to move from data representing a student's correct or incorrect responses on individual test items to inferences about the form and quantity of changes in the student's underlying ability. This study proposed and investigated a single integrated model of underlying growth within an Item Response Theory framework as a potential alternative to this approach. A Monte Carlo investigation explored parameter recovery for marginal maximum likelihood estimates via the Expectation-Maximization algorithm under variations of several conditions, including the form of the underlying growth trajectory, the amount of inter-individual variation in the rate(s) of growth, the sample size, the number of items at each time point, and the selection of items administered across time points. A real data illustration with mathematics assessment data from the Early Childhood Longitudinal Study showed the practical use of this integrated model for measuring gains in academic achievement. Overall, this exploration of an integrated model approach contributed to a better understanding of the appropriate use of growth models to draw valid inferences about students' academic growth over time.
An Empirical Investigation of Unscalable Components in Scaling Models
(2009) Braaten, Kristine Norene; Dayton, C. M.; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Guttman (1947) developed a scaling method in which the items measuring an attribute can be ordered according to the strength of the attribute. The Guttman scaling model assumes that every member of the population belongs to a scale type and does not allow for response errors. The Proctor (1970) and the intrusion-omission (Dayton and Macready, 1976) models introduced the notion that observed response patterns deviate from Guttman scale types because of response error. The Goodman (1975) model posited that part of the population is intrinsically unscalable. The extended Proctor and intrusion-omission (Dayton and Macready, 1980) models, commonly called extended Goodman models, include both response error and an intrinsically unscalable class (IUC). An alternative approach to the Goodman and extended Goodman models is the two-point mixture index of fit developed by Rudas, Clogg, and Lindsay (1994). The index, pi-star, is a descriptive measure used to assess fit when the data can be summarized in a contingency table for a hypothesized model. It is defined as the smallest proportion of cases that must be deleted from the observed frequency table to result in a perfect fit for the postulated model. In addition to contingency tables, pi-star can be applied to latent class models, including scaling models for dichotomous data. This study investigates the unscalable components in the extended Goodman models and the two-point mixture where the hypothesized model is the Proctor or intrusion-omission model. The question of interest is whether the index of fit associated with the Proctor or intrusion-omission model provides a potential alternative to the IUC proportion for the extended Proctor or intrusion-omission model, or in other words, whether or not pi-star and the IUC proportion are comparable. Simulation results in general did not support the notion that pi-star and the IUC proportion are comparable. Six-variable extended models outperformed their respective two-point mixture models with regard to the IUC proportion across almost every combination of condition levels. This is also true for the four-variable case except the pi-star models showed overall better performance when the true IUC proportion is small. A real data application illustrates the use of the models studied.
Testing for Differentially Functioning Indicators Using Mixtures of Confirmatory Factor Analysis Models
(2009) Mann, Heather Marie; Hancock, Gregory R; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
Heterogeneity in measurement model parameters across known groups can be modeled and tested using multigroup confirmatory factor analysis (CFA). When it is not reasonable to assume that parameters are homogeneous for all observations in a manifest group, mixture CFA models are appropriate. Mixture CFA models can add theoretically important unmeasured characteristics to capture heterogeneity and have the potential to be used to test measurement invariance. The current study investigated the ability of mixture CFA models to identify differences in factor loadings across latent classes when there is no mean separation in both the latent and measured variables. Using simulated data from models with known parameters, parameter recovery, classification accuracy, and the power of the likelihood-ratio test were evaluated as impacted by model complexity, sample size, latent class proportions, magnitude of factor loading differences, percentage of noninvariant factor loadings, and pattern of noninvariant factor loadings. Results suggested that mixture CFA models may be a viable option for testing the invariance of measurement model parameters, but without impact and differences in measurement intercepts, larger sample sizes, more noninvariant factor loadings, and larger amounts of heterogeneity are needed to distinguish different latent classes and successfully estimate their parameters.
Developing a Common Scale for Testlet Model Parameter Estimates under the Common-item Nonequivalent Groups Design
(2009) Li, Dongyang; Mislevy, Robert J; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
An important advantage of item response theory (IRT) is increased flexibility for methods of test equating. Several methods of IRT scaling have been developed, but under the assumption of local independence of item responses, such as Haebara's linking procedure. A recent development in IRT has been the introduction of Testlet Response Theory (TRT) models, in which local dependence among related sets of items is accounted for by the incorporation of "testlet effect" parameters in the model. This study extended Haebara's item characteristic curve scale linking method to the three-parameter logistic (3-PL) testlet model. Quadrature points and weights were used to approximate the estimated distribution of the testlet effect parameters so that the expected score of each item given theta can be computed and the scale linking parameters can be estimated. A simulation study was conducted to examine the performance of the proposed scale linking procedure by comparing it with the scale linking procedures that are based on the 3-PL IRT model and the graded response model. An operational data analysis was also performed to illustrate the application of the proposed scale linking method with real data.
Mixed-format Test Equating: Effects of Test Dimensionality and Common-item Sets
(2008-11-18) Cao, Yi; Lissitz, Robert; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
The main purposes of this study were to systematically investigate the impact of representativeness and non-representativeness of common-item sets in terms of statistical, content, and format specifications in mixed-format tests using concurrent calibration with unidimensional IRT models, as well as to examine its robustness to various multidimensional test structures. In order to fulfill these purposes, a simulation study was conducted, in which five factors - test dimensionality structure, group ability distributions, statistical, content and format representativeness - were manipulated. The examinees' true and estimated expected total scores were computed and BIAS, RMSE and Classification Consistency indices over 100 replications were then compared. The major findings were summarized as follows: First, considering all of the simulation conditions, the most notable and significant effects on the equating results appeared to be those due to the factor of group ability distributions. The equivalent groups condition always outperformed the nonequivalent groups condition on the various evaluation indices. Second, regardless of the group ability differences, there were no statistically and practically significant interaction effects among the factors of the statistical, content and format representativeness. Third, under the unidimensional test structure, the content and format representativeness factors showed little significant impact on the equating results. Meanwhile, the statistical representativeness factor affected the performance of the concurrent calibration significantly. Fourth, regardless of the various levels of multidimensional test structure, the statistical representativeness factor showed more significant and systematic effects on the performance of the concurrent calibration than the content and format representativeness factors did. When the degree of multidimensionality due to multiple item formats increased, the format representativeness factor began to make significant differences especially under the nonequivalent groups condition. The content representativeness factor, however, showed minimum impact on the equating results regardless of the increase of the degree of multidimensionality due to different content areas. Fifth, the concurrent calibration was not quite robust to the violation of the unidimensionality since the performance of the concurrent calibration with the unidimensional IRT models declined significantly with the increase of the degree of multidimensionality.
Effects of Model Selection on the Coverage Probability of Confidence Intervals in Binary-Response Logistic Regression
(2008-07-24) Zhang, Dongquan; Dayton, C. Mitchell; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
While model selection is viewed as a fundamental task in data analysis, it imposes considerable effects on the subsequent inference. In applied statistics, it is common to carry out a data-driven approach in model selection and draw inference conditional on the selected model, as if it is given a priori. Parameter estimates following this procedure, however, generally do not reflect uncertainty about the model structure. As far as confidence intervals are concerned, it is often misleading to report estimates based upon conventional 1−α without considering possible post-model-selection impact. This paper addresses the coverage probability of confidence intervals of logit coefficients in binary-response logistic regression. We conduct simulation studies to examine the performance of automatic model selectors AIC and BIC, and their subsequent effects on actual coverage probability of interval estimates. Important considerations (e.g. model structure, covariate correlation, etc.) that may have key influence are investigated. This study contributes in terms of understanding quantitatively how the post-model-selection confidence intervals perform in terms of coverage in binary-response logistic regression models. A major conclusion was that while it is usually below the nominal level, there is no simple predictable pattern with regard to how and how far the actual coverage probability of confidence intervals may fall. The coverage probability varies given the effects of multiple factors: (1) While the model structure always plays a role of paramount importance, the covariate correlation significantly affects the interval's coverage, with the tendency that a higher correlation indicates a lower coverage probability. (2) No evidence shows that AIC inevitably outperforms BIC in terms of achieving higher coverage probability, or vice versa. The model selector's performance is dependent upon the uncertain model structure and/or the unknown parameter vector θ . (3) While the effect of sample size is intriguing, a larger sample size does not necessarily achieve asymptotically more accurate inference on interval estimates. (4) Although the binary threshold of the logistic model may affect the coverage probability, such effect is less important. It is more likely to become substantial with an unrestricted model when extreme values along the dimensions of other factors (e.g. small sample size, high covariate correlation) are observed.
IRT vs. Factor Analysis Approaches in Analyzing Multigroup Multidimensional Binary Data: The Effect of Structural Orthogonality, and the Equivalence in Test Structure, Item Difficulty, & Examinee Groups
(2008-05-30) Lin, Peng; Lissitz, Robert W; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
The purpose of this study was to investigate the performance of different approaches in analyzing multigroup multidimensional binary data under different conditions. Two multidimensional Item Response Theory (MIRT) methods (concurrent MIRT calibration and separate MIRT calibration with linking) and one factor analysis method (concurrent factor analysis calibration) were examined. The performance of the unidimensional IRT method compared to its multidimensional counterparts was also investigated. The study was based on simulated data. Common-item nonequivalent groups design was employed with the manipulation of four factors: the structural orthogonality, the equivalence of test structure, the equivalence of item difficulty, and the equivalence of examinee groups. The performance of the methods was evaluated based on the recovery of the item parameters and the estimation of the true score of the examinees. The results indicated that, in general, the concurrent factor analysis method performed as well as, sometimes even better than, the two MIRT methods in recovering the item parameters. However, in estimating the true score of examinees, the concurrent MIRT method usually performed better than the concurrent factor analysis method. The results also indicated that the unidimensional IRT method was quite robust to the violation of unidimensionality assumption.
The Multidimensional Generalized Graded Unfolding Model for Assessment of Change across Repeated Measures
(2008-05-13) Cui, Weiwei; Roberts, James S; Dayton, Chan M; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
A multidimensional extension of the generalized graded unfolding model for repeated measures (GGUM-RM) is introduced and applied to analyze attitude change across time using responses collected by a Thurstone or Likert questionnaire. The model conceptualizes the change across time as separate latent variables and provides direct estimates of both individual and group change while accounting for the dependency among latent variables. The parameters and hyperparameters of GGUM-RM are estimated by fully Bayesian estimation method via WinBUGS. The accuracy of the estimation procedure is demonstrated by a simulation study, and the application of the GGUM-RM is illustrated by the analysis of attitude change toward abortion among college students.

Human Development & Quantitative Methodology

Browse

Filters

Settings

Sort By

Results per page

Search Results