Theses and Dissertations from UMD
Permanent URI for this communityhttp://hdl.handle.net/1903/2
New submissions to the thesis/dissertation collections are added automatically as they are received from the Graduate School. Currently, the Graduate School deposits all theses and dissertations from a given semester after the official graduation date. This means that there may be up to a 4 month delay in the appearance of a give thesis/dissertation in DRUM
More information is available at Theses and Dissertations at University of Maryland Libraries.
Browse
25 results
Search Results
Item Exploring Unidimensional Proficiency Classification Accuracy From Multidimensional Data in a Vertical Scaling Context(2010) Kroopnick, Marc Howard; Mislevy, Robert J; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)When Item Response Theory (IRT) is operationally applied for large scale assessments, unidimensionality is typically assumed. This assumption requires that the test measures a single latent trait. Furthermore, when tests are vertically scaled using IRT, the assumption of unidimensionality would require that the battery of tests across grades measures the same trait, just at different levels of difficulty. Many researchers have shown that this assumption may not hold for certain test batteries and, therefore, the results from applying a unidimensional model to multidimensional data may be called into question. This research investigated the impact on classification accuracy when multidimensional vertical scaling data are estimated with a unidimensional model. The multidimensional compensatory two-parameter logistic model (MC2PL) was the data-generating model for two levels of a test administered to simulees of correspondingly different abilities. Simulated data from the MC2PL model was estimated according to a unidimensional two-parameter logistic (2PL) model and classification decisions were made from a simulated bookmark standard setting procedure based on the unidimensional estimation results. Those unidimensional classification decisions were compared to the "true" unidimensional classification (proficient or not proficient) of simulees in multidimensional space obtained by projecting a simulee's generating two-dimensional theta vector onto a unidimensional scale via a number correct transformation on the entire test battery (i.e. across both grades). Specifically, conditional classification accuracy measures were considered. That is, the proportion of truly not proficient simulees classified correctly and the proportion of truly proficient simulees classified correctly were the criterion variables. Manipulated factors in this simulation study included the confound of item difficulty with dimensionality, the difference in mean abilities on both dimensions of the simulees taking each test in the battery, the choice of common items used to link the exams, and the correlation of the two abilities. Results suggested that the correlation of the two abilities and the confound of item difficulty with dimensionality both had an effect on the conditional classification accuracy measures. There was little or no evidence that the choice of common items or the differences in mean abilities of the simulees taking each test had an effect.Item A MIXTURE RASCH MODEL WITH A COVARIATE:A SIMULATION STUDY VIA BAYESIAN MARKOV CHAIN MONTE CARLO ESTIMATION(2009) Dai, Yunyun; Mislevy, Robert J; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Mixtures of item response theory models have been proposed as a technique to explore response patterns in test data related to cognitive strategies, instructional sensitivity, and differential item functioning (DIF). Estimation proves challenging due to difficulties in identification and questions of effect size needed to recover underlying structures. In particular, the impact of auxiliary variables, or covariates, for examinees in estimation has not been systematically explored. The goal of this dissertation is to carry out a systematically designed simulation study to investigate the performance of mixture Rasch model (MRM) under Bayesian estimation using Markov Chain Monte Carlo (MCMC) method. The dependent variables in this study are (1) the proportion of cases in which the generating mixture structure is recovered, and (2) among those cases in which the structure is recovered, the bias and root mean squared error of parameter estimates. The foci of the study are to use a flexible logistic regression model to parameterize the relation between latent class membership and the examinee covariate, to study MCMC estimation behavior in light of effect size, and to provide insights and suggestions on model application and model estimation.Item NONPARTICIPATION OF THE 12TH GRADERS IN THE NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS: UNDERSTANDING DETERMINANTS OF NONRESPONSE AND ASSESSING THE IMPACT ON NAEP ESTIMATES OF NONRESPONSE BIAS ACCORDING TO PROPENSITY MODELS(2009) Chun, Young I.; Abraham, Katharine; Robinson, John; Sociology; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)This dissertation examines nonparticipation of 12th graders in the year 2000 National Assessment of Educational Progress (NAEP), using a model of nonresponse developed by Groves and Couper (1998). NAEP is a continuing assessment of American student knowledge in various subject areas including mathematics and science, and the possibility that its results could be contaminated by a low response rate was taken as very serious. The dissertation evaluates the statistical impact of nonparticipation bias on estimates of educational performance in NAEP, by applying response propensity models to the NAEP mathematics and science survey data and the corresponding school administrative data from over 20,000 seniors in the 2000 High School Transcript Study (HSTS). When NAEP and HSTS are merged, one has measures of individual- and school-level characteristics for nonparticipants as well as participants. Results indicate that nonresponse was not a serious contaminant, and applying response propensity based weights led to only about a 1-point difference out on average of 500 points in mathematics and of 300 points in science. The results support other recent research (e.g., Curtin, Press and Singer, 2000; Groves, 2006) showing minimal effects on nonresponse bias of lowered response rates.Item AN INFORMATION CORRECTION METHOD FOR TESTLET-BASED TEST ANALYSIS: FROM THE PERSPECTIVES OF ITEM RESPONSE THEORY AND GENERALIZABILITY THEORY(2009) Li, Feifei; MIslevy, Robert J.; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)An information correction method for testlet-based tests is introduced in this dissertation. This method takes advantage of both generalizability theory (GT) and item response theory (IRT). The measurement error for the examinee proficiency parameter is often underestimated when a unidimensional conditional-independence IRT model is specified for a testlet dataset. By using a design effect ratio composed of random variances which can be easily derived from GT analysis, it becomes possible to adjust the underestimated measurement error from the unidimensional IRT models to a more appropriate level. It is demonstrated how the information correction method can be implemented in the context of a testlet design. Through the simulation study, it is shown that the underestimated measurement errors of proficiency parameters from IRT calibration could be adjusted to the appropriate level despite the varying magnitude of local item dependence (LID), testlet length, balance of testlet length and number of the item parameters in the model. Each of the three factors (i.e., LID, testlet length and balance of testlet length) and their interactions have statistically significant effects on error adjustment. The real data example provides more details about when and how the information correction should be used in a test analysis. Results are evaluated by comparing the measurement errors from the IRT model with those from the testlet response theory (TRT) model. Given the robustness of the variance ratio, estimation of the information correction should be adequate for practical work.Item An Integrated Item Response Model for Evaluating Individual Students' Growth in Educational Achievement(2009) Koran, Jennifer; Hancock, Gregory R.; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Measuring continuous change or growth in individual students' academic abilities over time currently uses several statistical models or transformations to move from data representing a student's correct or incorrect responses on individual test items to inferences about the form and quantity of changes in the student's underlying ability. This study proposed and investigated a single integrated model of underlying growth within an Item Response Theory framework as a potential alternative to this approach. A Monte Carlo investigation explored parameter recovery for marginal maximum likelihood estimates via the Expectation-Maximization algorithm under variations of several conditions, including the form of the underlying growth trajectory, the amount of inter-individual variation in the rate(s) of growth, the sample size, the number of items at each time point, and the selection of items administered across time points. A real data illustration with mathematics assessment data from the Early Childhood Longitudinal Study showed the practical use of this integrated model for measuring gains in academic achievement. Overall, this exploration of an integrated model approach contributed to a better understanding of the appropriate use of growth models to draw valid inferences about students' academic growth over time.Item An Empirical Investigation of Unscalable Components in Scaling Models(2009) Braaten, Kristine Norene; Dayton, C. M.; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Guttman (1947) developed a scaling method in which the items measuring an attribute can be ordered according to the strength of the attribute. The Guttman scaling model assumes that every member of the population belongs to a scale type and does not allow for response errors. The Proctor (1970) and the intrusion-omission (Dayton and Macready, 1976) models introduced the notion that observed response patterns deviate from Guttman scale types because of response error. The Goodman (1975) model posited that part of the population is intrinsically unscalable. The extended Proctor and intrusion-omission (Dayton and Macready, 1980) models, commonly called extended Goodman models, include both response error and an intrinsically unscalable class (IUC). An alternative approach to the Goodman and extended Goodman models is the two-point mixture index of fit developed by Rudas, Clogg, and Lindsay (1994). The index, pi-star, is a descriptive measure used to assess fit when the data can be summarized in a contingency table for a hypothesized model. It is defined as the smallest proportion of cases that must be deleted from the observed frequency table to result in a perfect fit for the postulated model. In addition to contingency tables, pi-star can be applied to latent class models, including scaling models for dichotomous data. This study investigates the unscalable components in the extended Goodman models and the two-point mixture where the hypothesized model is the Proctor or intrusion-omission model. The question of interest is whether the index of fit associated with the Proctor or intrusion-omission model provides a potential alternative to the IUC proportion for the extended Proctor or intrusion-omission model, or in other words, whether or not pi-star and the IUC proportion are comparable. Simulation results in general did not support the notion that pi-star and the IUC proportion are comparable. Six-variable extended models outperformed their respective two-point mixture models with regard to the IUC proportion across almost every combination of condition levels. This is also true for the four-variable case except the pi-star models showed overall better performance when the true IUC proportion is small. A real data application illustrates the use of the models studied.Item Testing for Differentially Functioning Indicators Using Mixtures of Confirmatory Factor Analysis Models(2009) Mann, Heather Marie; Hancock, Gregory R; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)Heterogeneity in measurement model parameters across known groups can be modeled and tested using multigroup confirmatory factor analysis (CFA). When it is not reasonable to assume that parameters are homogeneous for all observations in a manifest group, mixture CFA models are appropriate. Mixture CFA models can add theoretically important unmeasured characteristics to capture heterogeneity and have the potential to be used to test measurement invariance. The current study investigated the ability of mixture CFA models to identify differences in factor loadings across latent classes when there is no mean separation in both the latent and measured variables. Using simulated data from models with known parameters, parameter recovery, classification accuracy, and the power of the likelihood-ratio test were evaluated as impacted by model complexity, sample size, latent class proportions, magnitude of factor loading differences, percentage of noninvariant factor loadings, and pattern of noninvariant factor loadings. Results suggested that mixture CFA models may be a viable option for testing the invariance of measurement model parameters, but without impact and differences in measurement intercepts, larger sample sizes, more noninvariant factor loadings, and larger amounts of heterogeneity are needed to distinguish different latent classes and successfully estimate their parameters.Item Developing a Common Scale for Testlet Model Parameter Estimates under the Common-item Nonequivalent Groups Design(2009) Li, Dongyang; Mislevy, Robert J; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)An important advantage of item response theory (IRT) is increased flexibility for methods of test equating. Several methods of IRT scaling have been developed, but under the assumption of local independence of item responses, such as Haebara's linking procedure. A recent development in IRT has been the introduction of Testlet Response Theory (TRT) models, in which local dependence among related sets of items is accounted for by the incorporation of "testlet effect" parameters in the model. This study extended Haebara's item characteristic curve scale linking method to the three-parameter logistic (3-PL) testlet model. Quadrature points and weights were used to approximate the estimated distribution of the testlet effect parameters so that the expected score of each item given theta can be computed and the scale linking parameters can be estimated. A simulation study was conducted to examine the performance of the proposed scale linking procedure by comparing it with the scale linking procedures that are based on the 3-PL IRT model and the graded response model. An operational data analysis was also performed to illustrate the application of the proposed scale linking method with real data.Item Mixed-format Test Equating: Effects of Test Dimensionality and Common-item Sets(2008-11-18) Cao, Yi; Lissitz, Robert; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)The main purposes of this study were to systematically investigate the impact of representativeness and non-representativeness of common-item sets in terms of statistical, content, and format specifications in mixed-format tests using concurrent calibration with unidimensional IRT models, as well as to examine its robustness to various multidimensional test structures. In order to fulfill these purposes, a simulation study was conducted, in which five factors - test dimensionality structure, group ability distributions, statistical, content and format representativeness - were manipulated. The examinees' true and estimated expected total scores were computed and BIAS, RMSE and Classification Consistency indices over 100 replications were then compared. The major findings were summarized as follows: First, considering all of the simulation conditions, the most notable and significant effects on the equating results appeared to be those due to the factor of group ability distributions. The equivalent groups condition always outperformed the nonequivalent groups condition on the various evaluation indices. Second, regardless of the group ability differences, there were no statistically and practically significant interaction effects among the factors of the statistical, content and format representativeness. Third, under the unidimensional test structure, the content and format representativeness factors showed little significant impact on the equating results. Meanwhile, the statistical representativeness factor affected the performance of the concurrent calibration significantly. Fourth, regardless of the various levels of multidimensional test structure, the statistical representativeness factor showed more significant and systematic effects on the performance of the concurrent calibration than the content and format representativeness factors did. When the degree of multidimensionality due to multiple item formats increased, the format representativeness factor began to make significant differences especially under the nonequivalent groups condition. The content representativeness factor, however, showed minimum impact on the equating results regardless of the increase of the degree of multidimensionality due to different content areas. Fifth, the concurrent calibration was not quite robust to the violation of the unidimensionality since the performance of the concurrent calibration with the unidimensional IRT models declined significantly with the increase of the degree of multidimensionality.Item Effects of Model Selection on the Coverage Probability of Confidence Intervals in Binary-Response Logistic Regression(2008-07-24) Zhang, Dongquan; Dayton, C. Mitchell; Measurement, Statistics and Evaluation; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)While model selection is viewed as a fundamental task in data analysis, it imposes considerable effects on the subsequent inference. In applied statistics, it is common to carry out a data-driven approach in model selection and draw inference conditional on the selected model, as if it is given a priori. Parameter estimates following this procedure, however, generally do not reflect uncertainty about the model structure. As far as confidence intervals are concerned, it is often misleading to report estimates based upon conventional 1−α without considering possible post-model-selection impact. This paper addresses the coverage probability of confidence intervals of logit coefficients in binary-response logistic regression. We conduct simulation studies to examine the performance of automatic model selectors AIC and BIC, and their subsequent effects on actual coverage probability of interval estimates. Important considerations (e.g. model structure, covariate correlation, etc.) that may have key influence are investigated. This study contributes in terms of understanding quantitatively how the post-model-selection confidence intervals perform in terms of coverage in binary-response logistic regression models. A major conclusion was that while it is usually below the nominal level, there is no simple predictable pattern with regard to how and how far the actual coverage probability of confidence intervals may fall. The coverage probability varies given the effects of multiple factors: (1) While the model structure always plays a role of paramount importance, the covariate correlation significantly affects the interval's coverage, with the tendency that a higher correlation indicates a lower coverage probability. (2) No evidence shows that AIC inevitably outperforms BIC in terms of achieving higher coverage probability, or vice versa. The model selector's performance is dependent upon the uncertain model structure and/or the unknown parameter vector θ . (3) While the effect of sample size is intriguing, a larger sample size does not necessarily achieve asymptotically more accurate inference on interval estimates. (4) Although the binary threshold of the logistic model may affect the coverage probability, such effect is less important. It is more likely to become substantial with an unrestricted model when extreme values along the dimensions of other factors (e.g. small sample size, high covariate correlation) are observed.
- «
- 1 (current)
- 2
- 3
- »